Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets
Jason Gray, Daniele Sgandurra, Lorenzo Cavallaro

TL;DR
This paper surveys methods for attributing malicious binaries to their authors, discusses adversarial challenges, and introduces a large dataset to advance research in binary authorship attribution.
Contribution
It provides a comprehensive review of malicious author style techniques, examines adversarial impacts, and releases a large, diverse dataset for future research.
Findings
Adversarial techniques significantly hinder attribution accuracy.
Key challenges include lack of ground truth datasets.
The released dataset facilitates benchmarking and development of new methods.
Abstract
Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to identify authorship style. Our survey explores malicious author style and the adversarial techniques used by them to remain anonymous. We examine the adversarial impact on the state-of-the-art methods. We identify key findings and explore the open research challenges. To mitigate the lack of ground truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 15,660 malware labeled to 164 threat actor groups.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Authorship Attribution and Profiling · Hate Speech and Cyberbullying Detection
