Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention

Stephan Rabanser; Ali Shahin Shamsabadi; Olive Franzese; Xiao Wang; Adrian Weller; Nicolas Papernot

arXiv:2505.23968·cs.CR·June 2, 2025

Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention

Stephan Rabanser, Ali Shahin Shamsabadi, Olive Franzese, Xiao Wang, Adrian Weller, Nicolas Papernot

PDF

Open Access 1 Repo

TL;DR

This paper introduces Confidential Guardian, a framework that detects malicious manipulation of model confidence scores in cautious prediction systems, ensuring trustworthy abstention decisions in safety-critical applications.

Contribution

It presents a novel threat model and attack method called Mirage, and proposes Confidential Guardian, a verification framework using calibration analysis and zero-knowledge proofs to prevent confidence manipulation.

Findings

01

Mirage can covertly reduce confidence in specific inputs without affecting overall accuracy.

02

Confidential Guardian effectively detects artificially suppressed confidence scores.

03

The framework provides verifiable assurances of genuine model uncertainty.

Abstract

Cautious predictions -- where a machine learning model abstains when uncertain -- are crucial for limiting harmful errors in safety-critical applications. In this work, we identify a novel threat: a dishonest institution can exploit these mechanisms to discriminate or unjustly deny services under the guise of uncertainty. We demonstrate the practicality of this threat by introducing an uncertainty-inducing attack called Mirage, which deliberately reduces confidence in targeted input regions, thereby covertly disadvantaging specific individuals. At the same time, Mirage maintains high predictive performance across all data points. To counter this threat, we propose Confidential Guardian, a framework that analyzes calibration metrics on a reference dataset to detect artificially suppressed confidence. Additionally, it employs zero-knowledge proofs of verified inference to ensure that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cleverhans-lab/confidential-guardian
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptographic Implementations and Security · Advanced Malware Detection Techniques · Security and Verification in Computing