Confidence-Weighted Token Set Cover for Early Hypothesis Pruning in Self-Consistency
Md Arafat Sultan, Ram\'on Fernandez Astudillo

TL;DR
This paper introduces a token-efficient self-consistency method for long chain-of-thought reasoning by early hypothesis pruning using confidence and lexical coverage, reducing token usage by up to 35%.
Contribution
It proposes a novel early pruning technique with a weighted set cover algorithm to improve token efficiency in self-consistency for reasoning tasks.
Findings
Token efficiency improved by 10-35% across models.
Method maintains reasoning accuracy while reducing token use.
Applicable to multiple large language models and math benchmarks.
Abstract
Despite its simplicity and efficacy, the high token expenditure of self-consistency can limit its practical utility. Here we investigate if self-consistency can be made more token-efficient for long chain-of-thought reasoning tasks, while preserving its parallelism, through early hypothesis pruning. Concretely, we generate all solutions in parallel, but periodically prune intermediate hypotheses that are deemed unnecessary based on two lightweight indicators: (a) the model's own confidence in individual hypotheses, and (b) lexical coverage of all current hypotheses by candidate subsets that are under consideration for continued retention. We design a fast weighted set cover algorithm that utilizes the two indicators; our evaluation of five LLMs on three math benchmarks shows that this method can improve token efficiency for all models, by 10-35% in many cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
