More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)

Sagi Meir; Tommer D. Keidar; Noam Levi; Shlomi Reuveni; Barak Hirshberg

arXiv:2601.21522·cs.LG·January 30, 2026

More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)

Sagi Meir, Tommer D. Keidar, Noam Levi, Shlomi Reuveni, Barak Hirshberg

PDF

Open Access

TL;DR

This paper introduces ReD, a method to improve the efficiency of large language model inference by increasing coverage at a fixed budget, reducing costs, and enabling better measurement of model capabilities.

Contribution

The paper proposes Reset-and-Discard (ReD), a novel query strategy that enhances coverage@cost and predicts savings, addressing diminishing returns in pass@k metrics.

Findings

01

ReD significantly reduces attempts, tokens, and costs in experiments.

02

ReD effectively predicts savings and infers power-law exponents.

03

ReD improves inference efficiency across multiple LLMs.

Abstract

The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@k, the probability of answering a question correctly at least once in k trials. At a fixed budget, a more suitable metric is coverage@cost, the average number of unique questions answered as a function of the total number of attempts. We connect the two metrics and show that the empirically-observed power-law behavior in pass@k leads to a sublinear growth of the coverage@cost (diminishing returns). To solve this problem, we propose Reset-and-Discard (ReD), a query method of LLMs that increases coverage@cost for any given budget, regardless of the pass@k form. Moreover, given a pass@k, we can quantitatively predict the savings in the total number of attempts using ReD. If pass@k is not available for the model, ReD can infer its power-law exponent. Experiments on three LLMs using HumanEval…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification