Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of   Search, RL and Distillation

Juno Kim; Denny Wu; Jason Lee; Taiji Suzuki

arXiv:2502.01694·cs.AI·March 4, 2025

Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

Juno Kim, Denny Wu, Jason Lee, Taiji Suzuki

PDF

Open Access

TL;DR

This paper models chain-of-thought reasoning as a metastable Markov process, demonstrating that search, reinforcement learning, and distillation can improve reasoning efficiency and capabilities in large language models.

Contribution

It introduces a metastable Markov process framework for reasoning, proving search benefits, and proposing finetuning and distillation methods to enhance reasoning models.

Findings

01

Search reduces the expected steps to reach reasoning clusters.

02

Limitations exist when using only local information of the pretrained graph.

03

Distillation creates a smaller, efficient reasoning model.

Abstract

A key paradigm to improve the reasoning capabilities of large language models (LLMs) is to allocate more inference-time compute to search against a verifier or reward model. This process can then be utilized to refine the pretrained model or distill its reasoning patterns into more efficient models. In this paper, we study inference-time compute by viewing chain-of-thought (CoT) generation as a metastable Markov process: easy reasoning steps (e.g., algebraic manipulations) form densely connected clusters, while hard reasoning steps (e.g., applying a relevant theorem) create sparse, low-probability edges between clusters, leading to phase transitions at longer timescales. Under this framework, we prove that implementing a search protocol that rewards sparse edges improves CoT by decreasing the expected number of steps to reach different clusters. In contrast, we establish a limit on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms