Trust, but Verify: Using Self-Supervised Probing to Improve   Trustworthiness

Ailin Deng; Shen Li; Miao Xiong; Zhirui Chen; and Bryan Hooi

arXiv:2302.02628·cs.LG·February 7, 2023

Trust, but Verify: Using Self-Supervised Probing to Improve Trustworthiness

Ailin Deng, Shen Li, Miao Xiong, Zhirui Chen, and Bryan Hooi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised probing framework to assess and reduce overconfidence in deep learning models, thereby enhancing their trustworthiness across multiple tasks and benchmarks.

Contribution

It proposes a novel, flexible self-supervised probing method that improves trustworthiness of models by addressing overconfidence issues, compatible with existing methods.

Findings

01

Effective in misclassification detection

02

Improves calibration of confidence scores

03

Enhances out-of-distribution detection

Abstract

Trustworthy machine learning is of primary importance to the practical deployment of deep learning models. While state-of-the-art models achieve astonishingly good performance in terms of accuracy, recent literature reveals that their predictive confidence scores unfortunately cannot be trusted: e.g., they are often overconfident when wrong predictions are made, or so even for obvious outliers. In this paper, we introduce a new approach of self-supervised probing, which enables us to check and mitigate the overconfidence issue for a trained model, thereby improving its trustworthiness. We provide a simple yet effective framework, which can be flexibly applied to existing trustworthiness-related methods in a plug-and-play manner. Extensive experiments on three trustworthiness-related tasks (misclassification detection, calibration and out-of-distribution detection) across various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

d-ailin/ssprobing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications