Demystifying Long Chain-of-Thought Reasoning in LLMs

Edward Yeo; Yuxuan Tong; Morry Niu; Graham Neubig; Xiang Yue

arXiv:2502.03373·cs.CL·February 6, 2025·3 cites

Demystifying Long Chain-of-Thought Reasoning in LLMs

Edward Yeo, Yuxuan Tong, Morry Niu, Graham Neubig, Xiang Yue

PDF

Open Access 1 Repo

TL;DR

This paper investigates the mechanisms behind long chain-of-thought reasoning in large language models, emphasizing the roles of supervised fine-tuning and reinforcement learning in developing and stabilizing these reasoning capabilities.

Contribution

It systematically analyzes how long CoT reasoning emerges in LLMs, providing practical insights into training strategies, reward shaping, and the importance of scaling compute for improved reasoning.

Findings

01

Supervised fine-tuning improves training efficiency but is not essential.

02

Increased training compute can lead to reasoning capabilities, but requires reward shaping.

03

Scaling verifiable reward signals, especially with noisy web data, enhances out-of-distribution reasoning.

Abstract

Scaling inference compute enhances reasoning in large language models (LLMs), with long chains-of-thought (CoTs) enabling strategies like backtracking and error correction. Reinforcement learning (RL) has emerged as a crucial method for developing these capabilities, yet the conditions under which long CoTs emerge remain unclear, and RL training requires careful design choices. In this study, we systematically investigate the mechanics of long CoT reasoning, identifying the key factors that enable models to generate long CoT trajectories. Through extensive supervised fine-tuning (SFT) and RL experiments, we present four main findings: (1) While SFT is not strictly necessary, it simplifies training and improves efficiency; (2) Reasoning capabilities tend to emerge with increased training compute, but their development is not guaranteed, making reward shaping crucial for stabilizing CoT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eddycmu/demystify-long-cot
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis · Multi-Agent Systems and Negotiation

MethodsBalanced Selection · Shrink and Fine-Tune