From Misclassifications to Outliers: Joint Reliability Assessment in Classification
Yang Li, Youyang Sha, Yinzhi Wang, Timothy Hospedales, Xi Shen, Shell Xu Hu, Xuanlong Yu

TL;DR
This paper introduces a unified framework and new metrics for jointly assessing out-of-distribution detection and failure prediction in classifiers, demonstrating improved reliability and practical guidance for real-world deployment.
Contribution
It proposes a joint evaluation framework with novel metrics and extends the SURE method to enhance classifier reliability across various OOD scenarios.
Findings
Double scoring functions outperform traditional methods in reliability.
OOD-based approaches are more effective under simple or far-OOD shifts.
The new SURE+ method significantly improves reliability in diverse scenarios.
Abstract
Building reliable classifiers is a fundamental challenge for deploying machine learning in real-world applications. A reliable system should not only detect out-of-distribution (OOD) inputs but also anticipate in-distribution (ID) errors by assigning low confidence to potentially misclassified samples. Yet, most prior work treats OOD detection and failure prediction as separated problems, overlooking their closed connection. We argue that reliability requires evaluating them jointly. To this end, we propose a unified evaluation framework that integrates OOD detection and failure prediction, quantified by our new metrics DS-F1 and DS-AURC, where DS denotes double scoring functions. Experiments on the OpenOOD benchmark show that double scoring functions yield classifiers that are substantially more reliable than traditional single scoring approaches. Our analysis further reveals that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Imbalanced Data Classification Techniques · Anomaly Detection Techniques and Applications
