A*-Decoding: Token-Efficient Inference Scaling
Giannis Chatziveroglou

TL;DR
A*-decoding is a search-based inference strategy that optimally utilizes compute budgets during language model decoding, achieving strong reasoning performance with fewer tokens and passes.
Contribution
Introduces A*-decoding, a novel search-based inference method that improves reasoning efficiency and performance in language models during inference.
Findings
Achieves comparable performance to larger models with fewer tokens and passes.
Enables smaller models to match larger models' reasoning accuracy.
Demonstrates structured search as an effective alternative to brute-force sampling.
Abstract
Inference-time scaling has emerged as a powerful alternative to parameter scaling for improving language model performance on complex reasoning tasks. While existing methods have shown strong performance gains under fixed compute budgets, there has been little focus on optimally utilizing that budget during inference. In this work, we introduce A*-decoding, a search-based inference-time strategy that builds on the A* search algorithm to optimally utilize a fixed compute budget by prioritizing high-quality reasoning paths during generation. We frame language model decoding as a structured search in a state space of partial solutions, applying the A* transition model to identify promising continuations guided by an external process supervision signal. In our experiments, A*-decoding reaches the performance levels of strong inference scaling baselines like best-of-N and particle filtering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsFocus
