Towards Provably Secure Generative AI: Reliable Consensus Sampling

Yu Cui; Hang Fu; Sicheng Pan; Zhuoyu Sun; Yifei Liu; Yuhong Nie; Bo Ran; Baohan Huang; Xufeng Zhang; Haibin Zhang; Cong Zuo; Licheng Wang

arXiv:2512.24925·cs.CR·January 1, 2026

Towards Provably Secure Generative AI: Reliable Consensus Sampling

Yu Cui, Hang Fu, Sicheng Pan, Zhuoyu Sun, Yifei Liu, Yuhong Nie, Bo Ran, Baohan Huang, Xufeng Zhang, Haibin Zhang, Cong Zuo, Licheng Wang

PDF

Open Access

TL;DR

This paper introduces Reliable Consensus Sampling (RCS), a new algorithm for generative AI that offers provable security, improved robustness against malicious manipulation, and better utility without abstention, supported by theoretical guarantees and extensive experiments.

Contribution

The paper proposes RCS, a novel primitive that enhances security and robustness of generative AI, eliminating abstention and providing theoretical risk control.

Findings

01

RCS significantly improves robustness against adversarial models.

02

RCS maintains utility comparable to existing methods.

03

Theoretical guarantees ensure controllable risk thresholds.

Abstract

Existing research on generative AI security is primarily driven by mutually reinforcing attack and defense methodologies grounded in empirical experience. This dynamic frequently gives rise to previously unknown attacks that can circumvent current detection and prevention. This necessitates the continual updating of security mechanisms. Constructing generative AI with provable security and theoretically controllable risk is therefore necessary. Consensus Sampling (CS) is a promising algorithm toward provably secure AI. It controls risk by leveraging overlap in model output probabilities. However, we find that CS relies on frequent abstention to avoid unsafe outputs, which reduces utility. Moreover, CS becomes highly vulnerable when unsafe models are maliciously manipulated. To address these issues, we propose a new primitive called Reliable Consensus Sampling (RCS), that traces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Smart Grid Security and Resilience