MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning
Jiliang Li, Yifan Zhang, Yu Huang, Kevin Leach

TL;DR
MalMixer is a semi-supervised malware classification method that leverages domain-knowledge-aware data augmentation to achieve high accuracy with limited labeled samples, addressing the challenge of rapidly evolving malware.
Contribution
The paper introduces MalMixer, a novel semi-supervised malware classifier that uses domain-knowledge-aware data augmentation for effective few-shot learning.
Findings
MalMixer outperforms existing methods in few-shot malware classification.
Domain-knowledge-aware data augmentation improves classifier accuracy.
MalMixer reduces the need for extensive manual malware analysis.
Abstract
Recent growth and proliferation of malware have tested practitioners ability to promptly classify new samples according to malware families. In contrast to labor-intensive reverse engineering efforts, machine learning approaches have demonstrated increased speed and accuracy. However, most existing deep-learning malware family classifiers must be calibrated using a large number of samples that are painstakingly manually analyzed before training. Furthermore, as novel malware samples arise that are beyond the scope of the training set, additional reverse engineering effort must be employed to update the training set. The sheer volume of new samples found in the wild creates substantial pressure on practitioners ability to reverse engineer enough malware to adequately train modern classifiers. In this paper, we present MalMixer, a malware family classifier using semi-supervised learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
