From Misclassifications to Outliers: Joint Reliability Assessment in Classification

Yang Li; Youyang Sha; Yinzhi Wang; Timothy Hospedales; Xi Shen; Shell Xu Hu; Xuanlong Yu

arXiv:2603.03903·cs.CV·March 5, 2026

From Misclassifications to Outliers: Joint Reliability Assessment in Classification

Yang Li, Youyang Sha, Yinzhi Wang, Timothy Hospedales, Xi Shen, Shell Xu Hu, Xuanlong Yu

PDF

Open Access

TL;DR

This paper introduces a unified framework and new metrics for jointly assessing out-of-distribution detection and failure prediction in classifiers, demonstrating improved reliability and practical guidance for real-world deployment.

Contribution

It proposes a joint evaluation framework with novel metrics and extends the SURE method to enhance classifier reliability across various OOD scenarios.

Findings

01

Double scoring functions outperform traditional methods in reliability.

02

OOD-based approaches are more effective under simple or far-OOD shifts.

03

The new SURE+ method significantly improves reliability in diverse scenarios.

Abstract

Building reliable classifiers is a fundamental challenge for deploying machine learning in real-world applications. A reliable system should not only detect out-of-distribution (OOD) inputs but also anticipate in-distribution (ID) errors by assigning low confidence to potentially misclassified samples. Yet, most prior work treats OOD detection and failure prediction as separated problems, overlooking their closed connection. We argue that reliability requires evaluating them jointly. To this end, we propose a unified evaluation framework that integrates OOD detection and failure prediction, quantified by our new metrics DS-F1 and DS-AURC, where DS denotes double scoring functions. Experiments on the OpenOOD benchmark show that double scoring functions yield classifiers that are substantially more reliable than traditional single scoring approaches. Our analysis further reveals that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Imbalanced Data Classification Techniques · Anomaly Detection Techniques and Applications