Consensus Sampling for Safer Generative AI
Adam Tauman Kalai, Yael Tauman Kalai, Or Zamir

TL;DR
This paper introduces consensus sampling, a robust aggregation method for generative AI safety that combines multiple distributions to mitigate risks and abstains when agreement is insufficient.
Contribution
It proposes a novel, model-agnostic algorithm for safe distribution aggregation with formal guarantees and demonstrates its effectiveness through experiments.
Findings
Consensus sampling achieves risk levels comparable to the safest subset.
The method provides formal R-robustness guarantees against adversarial influence.
Experiments show the approach works on synthetic and image generation tasks.
Abstract
Motivated by undetectable risks in generative AI, we study a general robust aggregation problem: how to aggregate several probability distributions to boost safety. We present consensus sampling, a black-box algorithm that, given k distributions, has risk competitive with the average risk of the safest while abstaining when there is insufficient agreement. This yields an architecture-agnostic approach to generative-model safety when the distributions are induced by models that can sample and evaluate output probabilities. We formalize the guarantee through R-robustness, which also bounds information leakage and adversarial influence. Inspired by robust statistics and the provable copyright protection algorithm of Vyas et al (2023), we show that while a standard mixture is vulnerable to one unsafe constituent, a pointwise-median construction provides robust intuition, and our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
