MarkovScale: Towards Optimal Sequential Scaling at Inference Time

Youkang Wang; Jian Wang; Rubing Chen; Tianyi Zeng; Xiao-Yong Wei; Qing Li

arXiv:2602.01120·cs.LG·February 3, 2026

MarkovScale: Towards Optimal Sequential Scaling at Inference Time

Youkang Wang, Jian Wang, Rubing Chen, Tianyi Zeng, Xiao-Yong Wei, Qing Li

PDF

Open Access 3 Reviews

TL;DR

MarkovScale introduces a principled Markov process framework for sequential scaling at inference time, providing theoretical bounds and a system that outperforms existing methods across multiple large language models and benchmarks.

Contribution

It models sequential scaling as a two-state Markov process, deriving optimality conditions and developing a system that balances accuracy and efficiency based on these principles.

Findings

01

Outperforms state-of-the-art scaling methods across multiple benchmarks

02

Provides theoretical bounds for accuracy improvements

03

Demonstrates effectiveness on diverse LLMs and configurations

Abstract

Sequential scaling is a prominent inference-time scaling paradigm, yet its performance improvements are typically modest and not well understood, largely due to the prevalence of heuristic, non-principled approaches that obscure clear optimality bounds. To address this, we propose a principled framework that models sequential scaling as a two-state Markov process. This approach reveals the underlying properties of sequential scaling and yields closed-form solutions for essential aspects, such as the specific conditions under which accuracy is improved and the theoretical upper, neutral, and lower performance bounds. Leveraging this formulation, we develop MarkovScale, a practical system that applies these optimality criteria to achieve a theoretically grounded balance between accuracy and efficiency. Comprehensive experiments across 3 backbone LLMs, 5 benchmarks, and over 20…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

* The paper is tacking a very important topic of token-efficient inference-time scaling. * It seems the paper is trying to move from previous heuristic based approaches to an approach that is a little more backed by theoretical formulation which seems nice.

Weaknesses

* Formulation seems rather oversimplified as overall reasoning process in inference-time scaling is not easy to boil down to simply correct vs incorrect. It is literally a reasoning process where things can go south then use that as a context to later converge on a better outcome. However, the oversimplification of the formulation seems to understate the significance of this. * Seems rather unclear how the transition probabilities and zero-shot probability are computed in the MarkovScale.

Reviewer 02Rating 6Confidence 2

Strengths

- Clear theoretical framework. - Easy to understand approach yet providing good empirical accuracy. - Clear and consistent improvements against many benchmarks.

Weaknesses

- I am too far away from the field to judge this in detail

Reviewer 03Rating 6Confidence 3

Strengths

- This work is timely and addresses a key question about test-time scaling, namely how much to do and in what circumstances it is beneficial to do so - The performance bounds are nice and give a benchmark against which to compare methods - Despite some problems with the exposition the central idea in this paper is quite elegant and uncomplicated.

Weaknesses

- Figure 3 could do with some improvement. I think a bar chart or some other chart that doesn't imply an interpolation between benchmarks might be more appropriate. - The exposition in section 3.3 is a bit sloppy. For instance, where does this theoretical bias term originate from? What does $q$ represent (a question I'm guessing)? Is $p$ in (6), (7) and (8) $p_0$ or $p_i$? - I think the framing around section 3.3 could do with some justification: I'm a bit skeptical about model capability and pr

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis