Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention
Stephan Rabanser, Ali Shahin Shamsabadi, Olive Franzese, Xiao Wang, Adrian Weller, Nicolas Papernot

TL;DR
This paper introduces Confidential Guardian, a framework that detects malicious manipulation of model confidence scores in cautious prediction systems, ensuring trustworthy abstention decisions in safety-critical applications.
Contribution
It presents a novel threat model and attack method called Mirage, and proposes Confidential Guardian, a verification framework using calibration analysis and zero-knowledge proofs to prevent confidence manipulation.
Findings
Mirage can covertly reduce confidence in specific inputs without affecting overall accuracy.
Confidential Guardian effectively detects artificially suppressed confidence scores.
The framework provides verifiable assurances of genuine model uncertainty.
Abstract
Cautious predictions -- where a machine learning model abstains when uncertain -- are crucial for limiting harmful errors in safety-critical applications. In this work, we identify a novel threat: a dishonest institution can exploit these mechanisms to discriminate or unjustly deny services under the guise of uncertainty. We demonstrate the practicality of this threat by introducing an uncertainty-inducing attack called Mirage, which deliberately reduces confidence in targeted input regions, thereby covertly disadvantaging specific individuals. At the same time, Mirage maintains high predictive performance across all data points. To counter this threat, we propose Confidential Guardian, a framework that analyzes calibration metrics on a reference dataset to detect artificially suppressed confidence. Additionally, it employs zero-knowledge proofs of verified inference to ensure that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptographic Implementations and Security · Advanced Malware Detection Techniques · Security and Verification in Computing
