On the Reliability and Stability of Selective Methods in Malware Classification Tasks

Alexander Herzog; Aliai Eusebi; Lorenzo Cavallaro

arXiv:2505.22843·cs.CR·January 22, 2026

On the Reliability and Stability of Selective Methods in Malware Classification Tasks

Alexander Herzog, Aliai Eusebi, Lorenzo Cavallaro

PDF

Open Access

TL;DR

This paper introduces Aurora, a framework for evaluating the reliability and stability of malware classifiers' confidence estimates under distribution shifts, highlighting issues in current state-of-the-art methods and emphasizing the need for more robust operational assessment.

Contribution

The paper proposes Aurora, a novel evaluation framework that assesses confidence quality and operational resilience of malware classifiers over time, addressing gaps in current evaluation practices.

Findings

01

State-of-the-art malware classifiers exhibit fragility under distribution shifts.

02

Unreliable confidence estimates reduce operational trust and efficiency.

03

Current evaluation metrics overlook long-term stability and reliability.

Abstract

The performance figures of modern drift-adaptive malware classifiers appear promising, but does this translate to genuine operational reliability? The standard evaluation paradigm primarily focuses on baseline performance metrics, neglecting confidence-error alignment and operational stability. While prior works established the importance of temporal evaluation and introduced selective classification in malware classification tasks, we take a complementary direction by investigating whether malware classifiers maintain reliable and stable confidence estimates under distribution shifts and exploring the tensions between scientific advancement and practical impacts when they do not. We propose Aurora, a framework to evaluate malware classifiers based on their confidence quality and operational resilience. Aurora subjects the confidence profile of a given model to verification to assess…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection

MethodsSparse Evolutionary Training