Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Amirhosein Ghasemabadi, Keith G. Mills, Baochun Li, Di Niu

TL;DR
Guided by Gut (GG) is a self-guided test-time scaling method for large language models that improves efficiency and accuracy without external reward models, using intrinsic signals and reinforcement learning.
Contribution
Introduces GG, a novel self-guided TTS framework that replaces costly external verifiers with intrinsic signals and reinforcement learning, enabling smaller models to match larger ones efficiently.
Findings
Achieves comparable accuracy to larger models with smaller models (e.g., 1.5B vs. 70B).
Reduces GPU memory usage by up to 10x and inference time by 8x.
Decreases KV cache memory by approximately 50% compared to BoN.
Abstract
Test-Time Scaling (TTS) methods for enhancing Large Language Model (LLM) reasoning often incur substantial computational costs, primarily due to extensive reliance on external Process Reward Models (PRMs) or sampling methods like Best-of-N (BoN). This paper introduces Guided by Gut (GG), an efficient self-guided TTS framework that achieves PRM-level performance without costly external verifier models. Our method employs a lightweight tree search guided solely by intrinsic LLM signals, token-level confidence and step novelty. One critical innovation is improving the reliability of internal confidence estimates via a targeted reinforcement learning fine-tuning phase. Empirical evaluations on challenging mathematical reasoning benchmarks demonstrate that GG enables smaller models (e.g., 1.5B parameters) to achieve accuracy matching or surpassing significantly larger models (e.g., 32B-70B…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Neural Networks and Applications
