Beyond What Seems Necessary: Hidden Gains from Scaling Training-Time Reasoning Length under Outcome Supervision

Yihao Xue; Allan Zhang; Jianhao Huang; Amit Sahai; Baharan Mirzasoleiman

arXiv:2602.00927·cs.LG·February 3, 2026

Beyond What Seems Necessary: Hidden Gains from Scaling Training-Time Reasoning Length under Outcome Supervision

Yihao Xue, Allan Zhang, Jianhao Huang, Amit Sahai, Baharan Mirzasoleiman

PDF

Open Access

TL;DR

This paper reveals that increasing training-time reasoning length under outcome supervision can improve out-of-distribution performance even after in-distribution performance saturates, due to stronger inductive biases and reduced shortcut reliance.

Contribution

It introduces a novel phenomenon and provides theoretical explanations for how longer reasoning during training enhances OOD generalization, supported by empirical experiments.

Findings

01

OOD performance continues to improve with reasoning length increases

02

Self-iteration induces stronger inductive biases

03

Regularization reduces reliance on shortcut solutions

Abstract

Training LLMs to think and reason for longer has become a key ingredient in building state-of-the-art models that can solve complex problems previously out of reach. Recent efforts pursue this in different ways, such as RL fine-tuning to elicit long CoT or scaling latent reasoning through architectural recurrence. This makes reasoning length an important scaling knob. In this work, we identify a novel phenomenon (both theoretically and experimentally): under outcome-only supervision, out-of-distribution (OOD) performance can continue improving as training-time reasoning length (e.g., the token budget in RL, or the loop count in looped Transformers) increases, even after in-distribution (ID) performance has saturated. This suggests that robustness may require a larger budget than ID validation alone would indicate. We provide theoretical explanations via two mechanisms: (i)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling