On the Reliability and Stability of Selective Methods in Malware Classification Tasks
Alexander Herzog, Aliai Eusebi, Lorenzo Cavallaro

TL;DR
This paper introduces Aurora, a framework for evaluating the reliability and stability of malware classifiers' confidence estimates under distribution shifts, highlighting issues in current state-of-the-art methods and emphasizing the need for more robust operational assessment.
Contribution
The paper proposes Aurora, a novel evaluation framework that assesses confidence quality and operational resilience of malware classifiers over time, addressing gaps in current evaluation practices.
Findings
State-of-the-art malware classifiers exhibit fragility under distribution shifts.
Unreliable confidence estimates reduce operational trust and efficiency.
Current evaluation metrics overlook long-term stability and reliability.
Abstract
The performance figures of modern drift-adaptive malware classifiers appear promising, but does this translate to genuine operational reliability? The standard evaluation paradigm primarily focuses on baseline performance metrics, neglecting confidence-error alignment and operational stability. While prior works established the importance of temporal evaluation and introduced selective classification in malware classification tasks, we take a complementary direction by investigating whether malware classifiers maintain reliable and stable confidence estimates under distribution shifts and exploring the tensions between scientific advancement and practical impacts when they do not. We propose Aurora, a framework to evaluate malware classifiers based on their confidence quality and operational resilience. Aurora subjects the confidence profile of a given model to verification to assess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection
MethodsSparse Evolutionary Training
