Post-Training as Reweighting: A Stochastic View of Reasoning Trajectories in Language Models
Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Bo Xue, Qingfu Zhang, Hau-San Wong, Taiji Suzuki

TL;DR
This paper models reasoning in language models as stochastic trajectories, revealing how post-training methods reweight reasoning paths and affect the model's ability to handle complex tasks, supported by theoretical and empirical analysis.
Contribution
It introduces a stochastic trajectory framework for understanding post-training reasoning, highlighting how reweighting influences reasoning diversity and task difficulty handling.
Findings
Post-training reweights reasoning trajectories, favoring high-probability paths.
Rare but crucial reasoning paths are suppressed by common post-training methods.
Exploration techniques help preserve low-probability, essential reasoning trajectories.
Abstract
Foundation models encode rich structural knowledge but often rely on post-training procedures to adapt their reasoning behavior to specific tasks. Popular approaches such as reinforcement learning with verifiable rewards (RLVR) and inference-time reward aggregation are typically analyzed from a performance perspective, leaving their effects on the underlying reasoning distribution less understood. In this work, we study post-training reasoning from a stochastic trajectory viewpoint. Following Kim et al. (2025), we model reasoning steps of varying difficulty as Markov transitions with different probabilities, and formalize reasoning processes using tree-structured Markov chains. Within this framework, pretraining corresponds to discovering the reasoning structure, while post-training primarily reweights existing chains of thought. We show that both RLVR and inference-time reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Ethics and Social Impacts of AI
