Ransomware classification using patch-based CNN and self-attention network on embedded N-grams of opcodes

Wentao Xiao; Bin Zhang; Xi Xiao; Arun Kumar S; Weizhe Zhang; Jiajia Zhang

doi:10.1016/j.future.2019.09.025

Ransomware is a special kind of malware, which leads to irreversible data losses and incurs enormous economic costs. It is an urgent task to detect ransomware nowadays. Further, in order to achieve appropriate defenses and reduce analysts’ workloads, ransomware must be not only detected, but also classified into families. Some ransomware, e.g., fingerprinting ransomware, can fingerprint the run-time environment and evade dynamic analysis. To detect this type of ransomware and speed up the processing in comparison to dynamic analyses, we propose a static analysis framework based on N-gram opcodes with deep learning. Since opcode sequences obtained from executable files have rich context and semantic information, we view the opcode sequence from a natural language sentences perspective. However, the lengths of the N-gram opcode sequences are widely different, ranging from hundreds to millions. Among them, the extremely long sequences are far beyond the ability of most of the deep neural network based sequence classifier, such as RNN. To address this problem and enhance the scalability of our framework, we partition the N-gram sequence into many patches and feed each patch into a self-attention based convolution neural network named SA-CNN. Subsequently, the outputs of SA-CNNs are concatenated and put into a bi-directional self-attention network to get the ransomware classification result. Compared with CNN and RNN, the self-attention mechanism exhibits the brilliant ability to capture complementary information of the distance-aware dependencies with high computational efficiency. To the best of our knowledge, we are the first to exploit self-attention mechanism on opcode sequences for ransomware classification. With the partition strategy and the power of the self-attention network, the framework captures rich context and semantic information from the extremely long sequence. The comprehensive experiments on a real-world dataset show that the proposed framework outperforms the state-of-the-art methods in many evaluations. © 2019 Elsevier B.V.

Journal	Data powered by TypesetFuture Generation Computer Systems
Publisher	Data powered by TypesetElsevier BV
ISSN	0167-739X
Open Access	No