Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States

Yurun Yuan; Tengyang Xie

arXiv:2603.19987·cs.LG·March 23, 2026

Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States

Yurun Yuan, Tengyang Xie

PDF

Open Access

TL;DR

This paper demonstrates that reintroducing explicit Markov states into reinforcement learning for large language models can overcome the existing capability ceiling, enabling more novel reasoning and discovery.

Contribution

It introduces the use of structured Markovian representations in LLM post-training, providing theoretical guarantees and empirical evidence of improved performance.

Findings

01

Markov states reduce sample complexity in LLM RL

02

Explicit Markov states break performance ceilings

03

Structured states enhance reasoning capabilities

Abstract

Reinforcement learning (RL) has become a standard paradigm for post-training and aligning Large Language Models (LLMs), yet recent evidence suggests it faces a persistent "capability ceiling": unlike classical RL systems that discover novel strategies, RL for LLMs often acts as a mere refiner of patterns already latent in pre-trained weights. In this work, we identify a fundamental structural bottleneck: while classical RL relies on compact, informative Markov states, current LLM post-training formulations are tethered to an ever-expanding history of actions. We revisit a classical principle long central to RL yet absent from LLM post-training: explicit Markov states. Theoretically, we provide rigorous guarantees demonstrating that leveraging estimated Markov states can significantly reduce sample complexity. Empirically, we show that introducing Markov states consistently breaks the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Topic Modeling