Detecting Malware with Information Complexity
Nadia Alshahwan, Earl T. Barr, David Clark, George Danezis

TL;DR
This paper presents a malware detection method using normalized compression distance (NCD) applied to binaries, achieving high accuracy without specialized knowledge and outperforming existing anti-malware tools.
Contribution
The study introduces a novel application of NCD for malware detection that is simple, effective, and does not require prior malware-specific knowledge.
Findings
Achieves 97.1% accuracy in malware classification
Outperforms 59 anti-malware programs on VirusTotal
Combining NCD with compressibility improves accuracy
Abstract
This work focuses on a specific front of the malware detection arms-race, namely the detection of persistent, disk-resident malware. We exploit normalised compression distance (NCD), an information theoretic measure, applied directly to binaries. Given a zoo of labelled malware and benign-ware, we ask whether a suspect program is more similar to our malware or to our benign-ware. Our approach classifies malware with 97.1% accuracy and a false positive rate of 3%. We achieve our results with off-the-shelf compressors and a standard machine learning classifier and without any specialised knowledge. An end-user need only collect a zoo of malware and benign-ware and then can immediately apply our techniques. We apply statistical rigour to our experiments and our selection of data. We demonstrate that accuracy can be optimised by combining NCD with the compressibility rates of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
