Multifamily Malware Models
Samanvitha Basole, Fabio Di Troia, Mark Stamp

TL;DR
This paper investigates how training data diversity affects malware detection accuracy, demonstrating that neighborhood-based algorithms generalize well and can reliably detect multiple malware families with a single model.
Contribution
It provides empirical evidence on the relationship between dataset generality and model accuracy, highlighting the effectiveness of neighborhood-based algorithms for multi-family malware detection.
Findings
Neighborhood-based algorithms outperform other techniques in generalizing across malware families.
Single models trained on diverse datasets can effectively detect multiple malware families.
Byte n-gram features are useful for malware classification.
Abstract
When training a machine learning model, there is likely to be a tradeoff between accuracy and the diversity of the dataset. Previous research has shown that if we train a model to detect one specific malware family, we generally obtain stronger results as compared to a case where we train a single model on multiple diverse families. However, during the detection phase, it would be more efficient to have a single model that can reliably detect multiple families, rather than having to score each sample against multiple models. In this research, we conduct experiments based on byte -gram features to quantify the relationship between the generality of the training dataset and the accuracy of the corresponding machine learning models, all within the context of the malware detection problem. We find that neighborhood-based algorithms generalize surprisingly well, far outperforming the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Anomaly Detection Techniques and Applications
