TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time (Extended Version)
Zeliang Kan, Shae McFadden, Daniel Arp, Feargus Pendlebury, Roberto, Jordaney, Johannes Kinder, Fabio Pierazzi, Lorenzo Cavallaro

TL;DR
This paper introduces TESSERACT, a framework to eliminate spatial and temporal biases in malware classification experiments, enabling more realistic evaluation and improved robustness of ML models over time.
Contribution
It proposes constraints for fair experiment design, a new robustness metric AUT, and an algorithm for tuning training data, addressing biases in malware detection research.
Findings
Biases inflate previous performance results
Periodic tuning improves classifier stability
Mitigation strategies delay performance decay
Abstract
Machine learning (ML) plays a pivotal role in detecting malicious software. Despite the high F1-scores reported in numerous studies reaching upwards of 0.99, the issue is not completely solved. Malware detectors often experience performance decay due to constantly evolving operating systems and attack methods, which can render previously learned knowledge insufficient for accurate decision-making on new inputs. This paper argues that commonly reported results are inflated due to two pervasive sources of experimental bias in the detection task: spatial bias caused by data distributions that are not representative of a real-world deployment; and temporal bias caused by incorrect time splits of data, leading to unrealistic configurations. To address these biases, we introduce a set of constraints for fair experiment design, and propose a new metric, AUT, for classifier robustness in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Anomaly Detection Techniques and Applications
MethodsSparse Evolutionary Training
