Trust, but Verify: Using Self-Supervised Probing to Improve Trustworthiness
Ailin Deng, Shen Li, Miao Xiong, Zhirui Chen, and Bryan Hooi

TL;DR
This paper introduces a self-supervised probing framework to assess and reduce overconfidence in deep learning models, thereby enhancing their trustworthiness across multiple tasks and benchmarks.
Contribution
It proposes a novel, flexible self-supervised probing method that improves trustworthiness of models by addressing overconfidence issues, compatible with existing methods.
Findings
Effective in misclassification detection
Improves calibration of confidence scores
Enhances out-of-distribution detection
Abstract
Trustworthy machine learning is of primary importance to the practical deployment of deep learning models. While state-of-the-art models achieve astonishingly good performance in terms of accuracy, recent literature reveals that their predictive confidence scores unfortunately cannot be trusted: e.g., they are often overconfident when wrong predictions are made, or so even for obvious outliers. In this paper, we introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model, thereby improving its trustworthiness. We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner. Extensive experiments on three trustworthiness-related tasks (misclassification detection, calibration and out-of-distribution detection) across various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
