Beyond the TESSERACT:Trustworthy Dataset Curation for Sound Evaluations of Android Malware Classifiers
Theo Chow, Mario D'Onghia, Lorenz Linhardt, Zeliang Kan, Daniel Arp, Lorenzo Cavallaro, and Fabio Pierazzi

TL;DR
This paper emphasizes the importance of trustworthy dataset curation for evaluating Android malware classifiers, identifying overlooked factors affecting evaluation reliability and proposing a methodology to improve dataset quality.
Contribution
It introduces five novel factors influencing evaluation discrepancies and offers a methodology for curating trustworthy datasets for Android malware detection.
Findings
Performance discrepancies persist despite realistic data sources.
Five overlooked factors significantly impact evaluation outcomes.
Recommendations improve the reliability of malware classifier assessments.
Abstract
The reliability of machine learning critically depends on dataset quality. While machine learning applied to computer vision and natural language processing benefits from high-quality benchmark datasets, cyber security often falls behind, as quality ties to the ability of accessing hard-to-obtain realistic data that may evolve over time. Android is, however, positioned uniquely in this ecosystem due to AndroZoo and other sources, which provide large-scale, continuously updated, and timestamped repositories of benign and malicious apps. Since their release, such data sources provided access to populations of Android apps that researchers can sample from to evaluate learning-based methods in realistic settings, i.e., over temporal frames to account for app evolution (natural distribution shift) and test datasets that reflect in-the-wild class ratios. Surprisingly, we observe that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Data Stream Mining Techniques · Imbalanced Data Classification Techniques
