A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs

Jake R. Watts; Joel Sokol

arXiv:2407.16994·cs.AI·September 30, 2025

A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs

Jake R. Watts, Joel Sokol

PDF

TL;DR

This paper introduces a voter-based stochastic rejection framework called RCR that enhances the safety of language model outputs by using multiple checkers to approve or regenerate responses, achieving low failure rates efficiently.

Contribution

The paper presents a novel voting-based rejection method for LLM outputs that guarantees low failure rates with Pareto-optimal cost, independent of the model used.

Findings

01

Failure rate decreases exponentially with cost

02

The system estimates performance accurately with limited data

03

Applicable to various language models regardless of size

Abstract

We propose an approach for preventing unsafe or otherwise low-quality large language model (LLM) outputs by leveraging the stochasticity of LLMs, an approach we call Repeated Checking with Regeneration (RCR). In this system, LLM checkers vote on the acceptability of a generated output, regenerating it if a threshold of disapproval is reached, until sufficient checkers approve. Based on our estimators for cost and failure rate and experimental data tailored to the application, our algorithm achieves a desired expected failure rate at Pareto-optimal cost. The failure rate provably decreases exponentially as a function of cost, and the models reasonably estimate the actual performance of such a system in action, even with limited data. This approach does not depend on the language model used, and could allow cheap, small LLMs to control, constrain, or at some tasks even outperform very…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.