A*-Decoding: Token-Efficient Inference Scaling

Giannis Chatziveroglou

arXiv:2505.13672·cs.AI·May 21, 2025

A*-Decoding: Token-Efficient Inference Scaling

Giannis Chatziveroglou

PDF

Open Access

TL;DR

A*-decoding is a search-based inference strategy that optimally utilizes compute budgets during language model decoding, achieving strong reasoning performance with fewer tokens and passes.

Contribution

Introduces A*-decoding, a novel search-based inference method that improves reasoning efficiency and performance in language models during inference.

Findings

01

Achieves comparable performance to larger models with fewer tokens and passes.

02

Enables smaller models to match larger models' reasoning accuracy.

03

Demonstrates structured search as an effective alternative to brute-force sampling.

Abstract

Inference-time scaling has emerged as a powerful alternative to parameter scaling for improving language model performance on complex reasoning tasks. While existing methods have shown strong performance gains under fixed compute budgets, there has been little focus on optimally utilizing that budget during inference. In this work, we introduce A*-decoding, a search-based inference-time strategy that builds on the A* search algorithm to optimally utilize a fixed compute budget by prioritizing high-quality reasoning paths during generation. We frame language model decoding as a structured search in a state space of partial solutions, applying the A* transition model to identify promising continuations guided by an external process supervision signal. In our experiments, A*-decoding reaches the performance levels of strong inference scaling baselines like best-of-N and particle filtering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsFocus