Dynamic Analysis of Executables to Detect and Characterize Malware

Michael R. Smith; Joe B. Ingram; Christopher C. Lamb; Timothy J.; Draelos; Justin E. Doak; James B. Aimone; Conrad D. James

arXiv:1711.03947·cs.CR·October 1, 2018

Dynamic Analysis of Executables to Detect and Characterize Malware

Michael R. Smith, Joe B. Ingram, Christopher C. Lamb, Timothy J., Draelos, Justin E. Doak, James B. Aimone, Conrad D. James

PDF

TL;DR

This paper explores machine learning techniques applied to system call data for malware detection, emphasizing generalization to new malware and providing insights for forensic analysis.

Contribution

It compares multiple machine learning algorithms for malware detection based on system calls and analyzes their robustness to concept drift in operational environments.

Findings

01

Achieved 90-95% accuracy in malware detection

02

Identified significant variations in precision and recall in real-world scenarios

03

Provided insights into malware features for forensic analysis

Abstract

It is needed to ensure the integrity of systems that process sensitive information and control many aspects of everyday life. We examine the use of machine learning algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored rather than the bytes of an executable. We examine several machine learning techniques for detecting malware including random forests, deep learning techniques, and liquid state machines. The experiments examine the effects of concept drift on each algorithm to understand how well the algorithms generalize to novel malware samples by testing them on data that was collected after the training data. The results suggest that each of the examined machine learning algorithms is a viable solution to detect malware-achieving between 90% and 95% class-averaged accuracy (CAA). In real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.