Detecting Malware with Information Complexity

Nadia Alshahwan; Earl T. Barr; David Clark; George Danezis

arXiv:1502.07661·cs.CR·February 27, 2015

Detecting Malware with Information Complexity

Nadia Alshahwan, Earl T. Barr, David Clark, George Danezis

PDF

TL;DR

This paper presents a malware detection method using normalized compression distance (NCD) applied to binaries, achieving high accuracy without specialized knowledge and outperforming existing anti-malware tools.

Contribution

The study introduces a novel application of NCD for malware detection that is simple, effective, and does not require prior malware-specific knowledge.

Findings

01

Achieves 97.1% accuracy in malware classification

02

Outperforms 59 anti-malware programs on VirusTotal

03

Combining NCD with compressibility improves accuracy

Abstract

This work focuses on a specific front of the malware detection arms-race, namely the detection of persistent, disk-resident malware. We exploit normalised compression distance (NCD), an information theoretic measure, applied directly to binaries. Given a zoo of labelled malware and benign-ware, we ask whether a suspect program is more similar to our malware or to our benign-ware. Our approach classifies malware with 97.1% accuracy and a false positive rate of 3%. We achieve our results with off-the-shelf compressors and a standard machine learning classifier and without any specialised knowledge. An end-user need only collect a zoo of malware and benign-ware and then can immediately apply our techniques. We apply statistical rigour to our experiments and our selection of data. We demonstrate that accuracy can be optimised by combining NCD with the compressibility rates of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.