Empirical Optimal Risk to Quantify Model Trustworthiness for Failure   Detection

Shuang Ao; Stefan Rueger; Advaith Siddharthan

arXiv:2308.03179·cs.AI·August 8, 2023

Empirical Optimal Risk to Quantify Model Trustworthiness for Failure Detection

Shuang Ao, Stefan Rueger, Advaith Siddharthan

PDF

Open Access

TL;DR

This paper introduces the Excess Area Under the Optimal RC Curve (E-AUoptRC) and Trust Index (TI) as new metrics to better evaluate model trustworthiness and failure detection performance, especially at the optimal coverage point.

Contribution

The paper proposes novel metrics, E-AUoptRC and TI, for more meaningful failure detection evaluation, addressing limitations of existing risk-coverage curve metrics.

Findings

01

E-AUoptRC better reflects model trustworthiness

02

High overall accuracy does not guarantee high Trust Index

03

Proposed metrics outperform existing evaluation methods

Abstract

Failure detection (FD) in AI systems is a crucial safeguard for the deployment for safety-critical tasks. The common evaluation method of FD performance is the Risk-coverage (RC) curve, which reveals the trade-off between the data coverage rate and the performance on accepted data. One common way to quantify the RC curve by calculating the area under the RC curve. However, this metric does not inform on how suited any method is for FD, or what the optimal coverage rate should be. As FD aims to achieve higher performance with fewer data discarded, evaluating with partial coverage excluding the most uncertain samples is more intuitive and meaningful than full coverage. In addition, there is an optimal point in the coverage where the model could achieve ideal performance theoretically. We propose the Excess Area Under the Optimal RC Curve (E-AUoptRC), with the area in coverage from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Anomaly Detection Techniques and Applications