Bag of Tricks for Inference-time Computation of LLM Reasoning
Fan Liu, Wenshuo Chao, Naiqiang Tan, Hao Liu

TL;DR
This paper benchmarks and analyzes various inference-time strategies for improving reasoning performance in large language models, revealing effective techniques and establishing a standardized evaluation framework.
Contribution
It systematically evaluates inference-time computation methods across multiple models and tasks, highlighting overlooked strategies that significantly boost reasoning performance.
Findings
Tuning temperature can improve reasoning accuracy by up to 5%.
A standardized benchmark for inference-time methods is established.
Over 20,000 GPU hours were used for extensive experiments.
Abstract
With the advancement of large language models (LLMs), solving complex reasoning tasks has gained increasing attention. Inference-time computation methods (e.g., Best-of-N, beam search, et al.) are particularly valuable as they can enhance reasoning performance without modifying model parameters or requiring additional training. However, these techniques come with implementation challenges, and most existing methods remain at the proof-of-concept stage with limited practical adoption due to their computational complexity and varying effectiveness across different tasks. In this paper, we investigate and benchmark diverse inference-time computation strategies across reasoning tasks of varying complexity. Since most current methods rely on a proposer-verifier pipeline that first generates candidate solutions (e.g., reasoning solutions) and then selects the best one based on reward signals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Formal Methods in Verification · Business Process Modeling and Analysis
