Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

Songjun Tu; Jiahao Lin; Qichao Zhang; Xiangyu Tian; Linjing Li; Xiangyuan Lan; Dongbin Zhao

arXiv:2505.10832·cs.CL·October 10, 2025

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

Songjun Tu, Jiahao Lin, Qichao Zhang, Xiangyu Tian, Linjing Li, Xiangyuan Lan, Dongbin Zhao

PDF

Open Access 2 Repos

TL;DR

This paper introduces AutoThink, a multi-stage reinforcement learning framework that enables large reasoning models to adaptively decide when to perform explicit reasoning, improving efficiency without sacrificing accuracy.

Contribution

It presents a novel RL-based method to dynamically control reasoning steps in LRMs, reducing unnecessary computation for simple tasks.

Findings

01

AutoThink improves accuracy by 6.4% on benchmark tasks.

02

It reduces token usage by 52%, enhancing efficiency.

03

The method is compatible with various R1-style models.

Abstract

Large reasoning models (LRMs) are proficient at generating explicit, step-by-step reasoning sequences before producing final answers. However, such detailed reasoning can introduce substantial computational overhead and latency, particularly for simple problems. To address this over-thinking problem, we explore how to equip LRMs with adaptive thinking capabilities: enabling them to dynamically decide whether or not to engage in explicit reasoning based on problem complexity. Building on R1-style distilled models, we observe that inserting a simple ellipsis ("...") into the prompt can stochastically trigger either a thinking or no-thinking mode, revealing a latent controllability in the reasoning behavior. Leveraging this property, we propose AutoThink, a multi-stage reinforcement learning (RL) framework that progressively optimizes reasoning policies via stage-wise reward shaping.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Data Stream Mining Techniques · Machine Learning and Data Classification

MethodsPruning