Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Mingyang Song, Mao Zheng

TL;DR
This paper introduces ConciseR, a two-stage reinforcement learning framework that enhances the conciseness and reasoning efficiency of large language models' responses, outperforming existing models on multiple reasoning benchmarks.
Contribution
The paper proposes a novel two-stage RL approach, ConciseR, which enforces response conciseness in LLMs while maintaining reasoning quality, using a walk-before-you-run strategy.
Findings
ConciseR generates more concise reasoning responses.
Outperforms recent state-of-the-art models on multiple benchmarks.
Effective in reducing overthinking and redundancy in LLM reasoning.
Abstract
As test-time scaling becomes a pivotal research frontier in Large Language Models (LLMs) development, contemporary and advanced post-training methodologies increasingly focus on extending the generation length of long Chain-of-Thought (CoT) responses to enhance reasoning capabilities toward DeepSeek R1-like performance. However, recent studies reveal a persistent overthinking phenomenon in state-of-the-art reasoning models, manifesting as excessive redundancy or repetitive thinking patterns in long CoT responses. To address this issue, in this paper, we propose a simple yet effective two-stage reinforcement learning framework for achieving concise reasoning in LLMs, named ConciseR. Specifically, the first stage, using more training steps, aims to incentivize the model's reasoning capabilities via Group Relative Policy Optimization with clip-higher and dynamic sampling components…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation
MethodsFocus
