Empirical Quantification of Spurious Correlations in Malware Detection
Bianca Perasso, Ludovico Lozza, Andrea Ponte, Luca Demetrio, Luca Oneto, Fabio Roli

TL;DR
This paper investigates how deep learning models for malware detection rely on spurious correlations, particularly compiler artifacts, and quantifies their impact to improve model robustness and deployment readiness.
Contribution
It provides a novel analysis quantifying the influence of compiler artifacts on malware detection models and compares two models for better deployment suitability.
Findings
Models heavily rely on compiler artifacts, reducing focus on actual code.
Quantified the impact of spurious correlations on detection accuracy.
Compared two models to identify more robust options for production.
Abstract
End-to-end deep learning exhibits unmatched performance for detecting malware, but such an achievement is reached by exploiting spurious correlations -- features with high relevance at inference time, but known to be useless through domain knowledge. While previous work highlighted that deep networks mainly focus on metadata, none investigated the phenomenon further, without quantifying their impact on the decision. In this work, we deepen our understanding of how spurious correlation affects deep learning for malware detection by highlighting how much models rely on empty spaces left by the compiler, which diminishes the relevance of the compiled code. Through our seminal analysis on a small-scale balanced dataset, we introduce a ranking of two end-to-end models to better understand which is more suitable to be put in production.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Digital and Cyber Forensics
MethodsFocus
