Shorter Thoughts, Same Answers: Difficulty-Scaled Segment-Wise RL for CoT Compression

Ye Tian; Aijun Liu

arXiv:2603.07598·cs.AI·March 10, 2026

Shorter Thoughts, Same Answers: Difficulty-Scaled Segment-Wise RL for CoT Compression

Ye Tian, Aijun Liu

PDF

Open Access

TL;DR

This paper introduces DSS-GRPO, a novel reinforcement learning method that compresses reasoning traces in language models by considering difficulty and segment boundaries, maintaining answer quality while reducing token usage.

Contribution

The paper proposes DSS-GRPO, a difficulty-scaled, segment-wise RL approach that improves reasoning trace compression without compromising answer accuracy.

Findings

01

Effective reduction in reasoning trace length

02

Maintains answer quality despite compression

03

Outperforms naive RL approaches in experiments

Abstract

Chain-of-thought (CoT) improves reasoning reliability but increases token cost, motivating post-training compression of explicit reasoning traces. However, the shortest sufficient reasoning is not universal: it depends on difficulty, model capacity, and training state, making fixed length targets brittle. In practice, naive RL-based compression can also undesirably shorten the user-facing answer, because a single completion-level learning signal leaks across the think/answer boundary. We propose Difficulty-Scaled Segment-Wise GRPO (DSS-GRPO), which decomposes returns into think and answer components, computes group-relative advantages per segment, and routes them with hard token masks so compression updates act only on think while answer alignment acts only on answer. DSS-GRPO uses prompt-wise within-group shaping and difficulty-aware scaling to encourage concise reasoning without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms