Towards Provably Secure Generative AI: Reliable Consensus Sampling
Yu Cui, Hang Fu, Sicheng Pan, Zhuoyu Sun, Yifei Liu, Yuhong Nie, Bo Ran, Baohan Huang, Xufeng Zhang, Haibin Zhang, Cong Zuo, Licheng Wang

TL;DR
This paper introduces Reliable Consensus Sampling (RCS), a new algorithm for generative AI that offers provable security, improved robustness against malicious manipulation, and better utility without abstention, supported by theoretical guarantees and extensive experiments.
Contribution
The paper proposes RCS, a novel primitive that enhances security and robustness of generative AI, eliminating abstention and providing theoretical risk control.
Findings
RCS significantly improves robustness against adversarial models.
RCS maintains utility comparable to existing methods.
Theoretical guarantees ensure controllable risk thresholds.
Abstract
Existing research on generative AI security is primarily driven by mutually reinforcing attack and defense methodologies grounded in empirical experience. This dynamic frequently gives rise to previously unknown attacks that can circumvent current detection and prevention. This necessitates the continual updating of security mechanisms. Constructing generative AI with provable security and theoretically controllable risk is therefore necessary. Consensus Sampling (CS) is a promising algorithm toward provably secure AI. It controls risk by leveraging overlap in model output probabilities. However, we find that CS relies on frequent abstention to avoid unsafe outputs, which reduces utility. Moreover, CS becomes highly vulnerable when unsafe models are maliciously manipulated. To address these issues, we propose a new primitive called Reliable Consensus Sampling (RCS), that traces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Smart Grid Security and Resilience
