A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs
Jake R. Watts, Joel Sokol

TL;DR
This paper introduces a voter-based stochastic rejection framework called RCR that enhances the safety of language model outputs by using multiple checkers to approve or regenerate responses, achieving low failure rates efficiently.
Contribution
The paper presents a novel voting-based rejection method for LLM outputs that guarantees low failure rates with Pareto-optimal cost, independent of the model used.
Findings
Failure rate decreases exponentially with cost
The system estimates performance accurately with limited data
Applicable to various language models regardless of size
Abstract
We propose an approach for preventing unsafe or otherwise low-quality large language model (LLM) outputs by leveraging the stochasticity of LLMs, an approach we call Repeated Checking with Regeneration (RCR). In this system, LLM checkers vote on the acceptability of a generated output, regenerating it if a threshold of disapproval is reached, until sufficient checkers approve. Based on our estimators for cost and failure rate and experimental data tailored to the application, our algorithm achieves a desired expected failure rate at Pareto-optimal cost. The failure rate provably decreases exponentially as a function of cost, and the models reasonably estimate the actual performance of such a system in action, even with limited data. This approach does not depend on the language model used, and could allow cheap, small LLMs to control, constrain, or at some tasks even outperform very…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
