MarkovScale: Towards Optimal Sequential Scaling at Inference Time
Youkang Wang, Jian Wang, Rubing Chen, Tianyi Zeng, Xiao-Yong Wei, Qing Li

TL;DR
MarkovScale introduces a principled Markov process framework for sequential scaling at inference time, providing theoretical bounds and a system that outperforms existing methods across multiple large language models and benchmarks.
Contribution
It models sequential scaling as a two-state Markov process, deriving optimality conditions and developing a system that balances accuracy and efficiency based on these principles.
Findings
Outperforms state-of-the-art scaling methods across multiple benchmarks
Provides theoretical bounds for accuracy improvements
Demonstrates effectiveness on diverse LLMs and configurations
Abstract
Sequential scaling is a prominent inference-time scaling paradigm, yet its performance improvements are typically modest and not well understood, largely due to the prevalence of heuristic, non-principled approaches that obscure clear optimality bounds. To address this, we propose a principled framework that models sequential scaling as a two-state Markov process. This approach reveals the underlying properties of sequential scaling and yields closed-form solutions for essential aspects, such as the specific conditions under which accuracy is improved and the theoretical upper, neutral, and lower performance bounds. Leveraging this formulation, we develop MarkovScale, a practical system that applies these optimality criteria to achieve a theoretically grounded balance between accuracy and efficiency. Comprehensive experiments across 3 backbone LLMs, 5 benchmarks, and over 20…
Peer Reviews
Decision·Submitted to ICLR 2026
* The paper is tacking a very important topic of token-efficient inference-time scaling. * It seems the paper is trying to move from previous heuristic based approaches to an approach that is a little more backed by theoretical formulation which seems nice.
* Formulation seems rather oversimplified as overall reasoning process in inference-time scaling is not easy to boil down to simply correct vs incorrect. It is literally a reasoning process where things can go south then use that as a context to later converge on a better outcome. However, the oversimplification of the formulation seems to understate the significance of this. * Seems rather unclear how the transition probabilities and zero-shot probability are computed in the MarkovScale.
- Clear theoretical framework. - Easy to understand approach yet providing good empirical accuracy. - Clear and consistent improvements against many benchmarks.
- I am too far away from the field to judge this in detail
- This work is timely and addresses a key question about test-time scaling, namely how much to do and in what circumstances it is beneficial to do so - The performance bounds are nice and give a benchmark against which to compare methods - Despite some problems with the exposition the central idea in this paper is quite elegant and uncomplicated.
- Figure 3 could do with some improvement. I think a bar chart or some other chart that doesn't imply an interpolation between benchmarks might be more appropriate. - The exposition in section 3.3 is a bit sloppy. For instance, where does this theoretical bias term originate from? What does $q$ represent (a question I'm guessing)? Is $p$ in (6), (7) and (8) $p_0$ or $p_i$? - I think the framing around section 3.3 could do with some justification: I'm a bit skeptical about model capability and pr
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis
