Trust, or Don't Predict: Introducing the CWSA Family for Confidence-Aware Model Evaluation
Kourosh Shahnazari, Seyed Moein Ayyoubzadeh, Mohammadali Keshtparvar, Pegah Ghaffari

TL;DR
This paper introduces the CWSA and CWSA+ metrics for evaluating confidence-aware models, explicitly rewarding correct confident predictions and penalizing overconfident errors, improving trust assessment in machine learning systems.
Contribution
The paper proposes novel, interpretable metrics CWSA and CWSA+ that better evaluate model reliability under confidence thresholds compared to traditional metrics.
Findings
CWSA and CWSA+ outperform classical metrics in detecting failure modes.
Metrics effectively distinguish between calibrated, overconfident, and underconfident models.
CWSA provides a reliable basis for safety-critical model evaluation.
Abstract
In recent machine learning systems, confidence scores are being utilized more and more to manage selective prediction, whereby a model can abstain from making a prediction when it is unconfident. Yet, conventional metrics like accuracy, expected calibration error (ECE), and area under the risk-coverage curve (AURC) do not capture the actual reliability of predictions. These metrics either disregard confidence entirely, dilute valuable localized information through averaging, or neglect to suitably penalize overconfident misclassifications, which can be particularly detrimental in real-world systems. We introduce two new metrics Confidence-Weighted Selective Accuracy (CWSA) and its normalized variant CWSA+ that offer a principled and interpretable way to evaluate predictive models under confidence thresholds. Unlike existing methods, our metrics explicitly reward confident accuracy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Anomaly Detection Techniques and Applications
