Malware Classification Based on System Call Sequences Using Deep Learning

Volume 5, Issue 4, Page No 207-216, 2020

Author’s Name: Rizki Jaka Maulana, Gede Putra Kusuma^a)

View Affiliations

Computer Science Department, BINUS Graduate Program, Bina Nusantara University, Jakarta, 11480, Indonesia

^a)Author to whom correspondence should be addressed. E-mail: inegara@binus.edu

Adv. Sci. Technol. Eng. Syst. J. 5(4), 207-216 (2020); DOI: 10.25046/aj050426

Keywords: Malware Classification, Malware Detection, System Call Sequence, Deep Learning, LSTM Model

Download Now!

817 Downloads

Export Citations

Abstract

Malware has always been a big problem for companies, government agencies, and individuals because people still use it as a primary tool to influence networks, applications, and computer operating systems to gain unilateral benefits. Until now, malware detection with heuristic and signature-based methods are still struggling to keep up with the evolution of malware. Machine learning is known to be able to automate the work needed to detect families of existing and newly discovered malware. Unfortunately, the machine learning method using Support Vector Machine (SVM) for detecting malware can only reach a low level of accuracy. In this work, we propose a dynamic analysis method and uses a system call sequence to monitor malware behavior. It uses the word2vec technique as word embedding and implements deep learning models, namely Long Short-Term Memory (LSTM) and Nested LSTM, as classifiers. To compare with existing machine learning approach, we also apply the Support Vector Machine (SVM) as a benchmark method. The Nested LSTM gets an accuracy of 93.11%, while the LSTM gets the best accuracy of 98.61%. The LSTM also achieved the best performance in terms of average precision at 97.57%, the average recall at 97.29%, and the average score of f1 at 97.43%. We have found that our model is lightweight but powerful for detecting malware with significant accuracy.

Received: 27 March 2020, Accepted: 06 June 2020, Published Online: 22 July 2020

References (31)

N. Aziz, Z. Yunos, and R. Ahmad, “A management framework for developing a malware eradication and remediation system to mitigate cyberattacks,” in Lecture Notes in Electrical Engineering, 481, 513–521, 2019.
R. Bavishi, M. Pradel, and K. Sen, “Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts,” 2018.
C. Raghuraman, S. Suresh, S. Shivshankar, and R. Chapaneri, “Static and dynamic malware analysis using machine learning,” in Advances in Intelligent Systems and Computing, 1045, 793–806, 2020.
Abbasi, “Leveraging behavior-based rules for malware family classification,” Dec. 2019.
H. Lim, “Detecting Malicious Behaviors of Software through Analysis of API Sequence k-grams,” Comput. Sci. Inf. Technol., 4, no. 3, 85–91, 2016.
Y. Ki, E. Kim, and H. K. Kim, “A novel approach to detect malware based on API call sequence analysis,” Int. J. Distrib. Sens. Networks, 2015.
V. Zenkov and J. Laska, “Dynamic data fusion using multi-input models for malware classification,” 2019.
M. Imran, M. T. Afzal, and M. A. Qadir, “Malware classification using dynamic features and Hidden Markov Model,” in Journal of Intelligent and Fuzzy Systems, 31(2), 837–847, 2016.
A. F. Agarap, “Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach using Support Vector Machine (SVM) for Malware Classification,” 2017.
M. F. Rafique, M. Ali, A. S. Qureshi, A. Khan, and A. M. Mirza, “Malware Classification using Deep Learning based Feature Extraction and Wrapper based Feature Selection Technique,” 1–20, 2019.
Y. Lu, G. Jonathan, and L. Jiang, “Deep Learning Based Malware Classification Using Deep Residual Network,” 2019.
A. Oliveira, U. N. De Julho, and U. N. De Julho, “Behavioral Malware Detection Using Deep Graph Convolutional Neural Networks,” 1–17, 2019.
Z. Zhang, C. Chang, P. Han, and H. Zhang, “Packed malware variants detection using deep belief networks,” MATEC Web Conf., 309, 02002, 2020.
K. HE and D.-S. KIM, “Malware Detection with Malware Images using Deep Learning Techniques,” 2018.
W. Hardy, L. Chen, S. Hou, Y. Ye, and X. Li, “DL4MD: A Deep Learning Framework for Intelligent Malware Detection,” Proc. Int. Conf. Data Min., 61–67, 2016.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations ofwords and phrases and their compositionality,” Adv. Neural Inf. Process. Syst., 3111-3119, 2013.
D. Meyer, “How exactly does word2vec work?,” Uoregon.Edu,Brocade.Com, 1–18, 2016.
S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., 9, no. 8, 1735–1780, 1997.
A. Sherstinsky, “Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network,” Phys. D Nonlinear Phenom., 404, p. 132306, Mar. 2020.
F. Miedema, “Sentiment Analysis with Long Short-Term Memory networks,” 1–17, 2018.
J. R. A. Moniz and D. Krueger, “Nested LSTMs,” J. Mach. Learn. Res., 77, 530–544, 2017.
Y. Ahuja and S. Kumar Yadav, “Multiclass Classification and Support Vector Machine,” Global Journal of Computer Science and Technology Interdisciplinary, 12(11), 14–19, 2012.
C. Brew, “Classifying ReachOut posts with a radial basis function SVM,” 2016.
L. Wang, B. Wang, J. Liu, Q. Miao, and J. Zhang, “Cuckoo-based malware dynamic analysis,” Int. J. Performability Eng., 15(3), 772–781, 2019.
“VirusShare.com.” [Online]. Available: https://virusshare.com/. [Accessed: 18-Apr-2020].
G. D. Webster, Z. D. Hanif, A. L. P. Ludwig, T. K. Lengyel, A. Zarras, and C. Eckert, “SKALD: A scalable architecture for feature extraction, multi-user analysis, and real-time information sharing,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 9866 LNCS, 231–249, 2016.
T. K. Lengyel, S. Maresca, B. D. Payne, G. D. Webster, S. Vogl, and A. Kiayias, “Scalability, fidelity and stealth in the DRAKVUF dynamic malware analysis system,” ACM Int. Conf. Proceeding Ser., 2014-Decem, no. December, 386–395, 2014.
S. Jamalpur, Y. S. Navya, P. Raja, G. Tagore, and G. R. K. Rao, “Dynamic Malware Analysis Using Cuckoo Sandbox,” in Proceedings of the International Conference on Inventive Communication and Computational Technologies, ICICCT 2018, 2018, 1056–1060, 2018.
“Kaspersky Cyber Security Solutions for Home & Business | Kaspersky.” [Online]. Available: https://www.kaspersky.com/. [Accessed: 18-Apr-2020].
“VirusTotal.” [Online]. Available: https://www.virustotal.com/gui/home/upload. [Accessed: 26-Apr-2019].
A. K. Santra and C. J. Christy, “Genetic Algorithm and Confusion Matrix for Document Clustering.” 2012.

Malware Classification Based on System Call Sequences Using Deep Learning