Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers

Jingkai Huang; Will Ma; Zhengyuan Zhou

arXiv:2602.05395·stat.ML·February 6, 2026

Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers

Jingkai Huang, Will Ma, Zhengyuan Zhou

PDF

Open Access

TL;DR

This paper introduces an efficient Bayesian stopping strategy for LLM inference that reduces sampling costs by using a simplified aggregation method, achieving optimality and significant cost savings.

Contribution

It proposes an L-aggregated stopping policy with theoretical guarantees, simplifying posterior tracking for cost-effective LLM answer inference.

Findings

01

Achieves asymptotic optimality with L=3

02

Reduces LLM calls by up to 50%

03

Maintains comparable answer accuracy

Abstract

A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to save on sampling costs, stopping once sufficient consistency is reached. Although the exact posterior is computationally intractable, we further introduce an efficient "L-aggregated" stopping policy that tracks only the L-1 most frequent answer counts. Theoretically, we prove that L=3 is all you need: this coarse approximation is sufficient to achieve asymptotic optimality, and strictly dominates prior-free baselines, while having a fast posterior computation. Empirically, this identifies the most consistent (i.e., mode) LLM answer using fewer samples, and can achieve similar answer accuracy while cutting the number of LLM calls (i.e., saving on LLM inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms