L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Pranjal Aggarwal; Sean Welleck

arXiv:2503.04697·cs.CL·October 6, 2025·2 cites

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Pranjal Aggarwal, Sean Welleck

PDF

Open Access 1 Repo

TL;DR

This paper introduces LCPO, a reinforcement learning method to control reasoning chain length in language models, enabling a trade-off between compute and accuracy, and revealing short reasoning capabilities in trained models.

Contribution

We develop LCPO for length-controlled reasoning in language models and demonstrate its effectiveness and novel short reasoning abilities.

Findings

01

L1 outperforms state-of-the-art length control methods.

02

Models trained with LCPO can generate short reasoning chains similar to non-reasoning models.

03

L1 surpasses GPT-4o at comparable reasoning lengths.

Abstract

Reasoning language models have shown an uncanny ability to improve performance at test-time by ``thinking longer''-that is, by generating longer chain-of-thought sequences and hence using more compute. However, the length of their chain-of-thought reasoning is not controllable, making it impossible to allocate test-time compute to achieve a desired level of performance. We introduce Length Controlled Policy Optimization (LCPO), a simple reinforcement learning method that optimizes for accuracy and adherence to user-specified length constraints. We use LCPO to train L1, a reasoning language model that produces outputs satisfying a length constraint given in its prompt. L1's length control allows for smoothly trading off computational cost and accuracy on a wide range of tasks, and outperforms the state-of-the-art S1 method for length control. Furthermore, we uncover an unexpected short…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cmu-l3/l1
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Software System Performance and Reliability