Reviewer Integration and Performance Measurement for Malware Detection
Brad Miller, Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz,, Rekha Bachwani, Riyaz Faizullabhoy, Ling Huang, Vaishaal Shankar, Tony Wu,, George Yiu, Anthony D. Joseph, J. D. Tygar

TL;DR
This paper presents a large-scale malware detection system that combines machine learning with expert reviews, significantly improving detection rates and analyzing the impact of label timing on system performance.
Contribution
It introduces a hybrid malware detection approach integrating limited expert reviews and highlights the effect of label timing on detection accuracy.
Findings
Expert reviews greatly enhance detection performance.
Using delayed labels inflates detection metrics.
System detects 89% of malware with limited reviews.
Abstract
We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system's ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778GB of raw feature data. Without reviewer assistance, we achieve 72% detection at a 0.5% false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89% and are able to detect 42% of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Network Security and Intrusion Detection
