BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec
Joon Sern Lee, Kai Keng Tay, Zong Fu Chua

TL;DR
BinImg2Vec enhances malware binary image classification by integrating self-supervised Data2Vec with supervised learning, leading to improved accuracy, reduced variance, and better embedding clustering for interpretability.
Contribution
This paper introduces BinImg2Vec, a novel framework combining Data2Vec self-supervised learning with supervised training for malware image classification.
Findings
Achieved a 4% increase in classification accuracy.
Reduced performance variance by 0.5% across multiple runs.
Produced well-clustered embeddings for better model explainability.
Abstract
Rapid digitalisation spurred by the Covid-19 pandemic has resulted in more cyber crime. Malware-as-a-service is now a booming business for cyber criminals. With the surge in malware activities, it is vital for cyber defenders to understand more about the malware samples they have at hand as such information can greatly influence their next course of actions during a breach. Recently, researchers have shown how malware family classification can be done by first converting malware binaries into grayscale images and then passing them through neural networks for classification. However, most work focus on studying the impact of different neural network architectures on classification performance. In the last year, researchers have shown that augmenting supervised learning with self-supervised learning can improve performance. Even more recently, Data2Vec was proposed as a modality agnostic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Advanced Malware Detection Techniques · Network Security and Intrusion Detection
