Reinforcement Learning for Chain of Thought Compression with One-Domain-to-All Generalization

Hanyu Li; Jiangshan Duo; Bofei Gao; Hailin Zhang; Sujian Li; Xiaotie Deng; Liang Zhao

arXiv:2601.06052·cs.CL·January 22, 2026

Reinforcement Learning for Chain of Thought Compression with One-Domain-to-All Generalization

Hanyu Li, Jiangshan Duo, Bofei Gao, Hailin Zhang, Sujian Li, Xiaotie Deng, Liang Zhao

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning-based method to compress chain-of-thought reasoning in large language models, reducing response length significantly while maintaining or improving accuracy and enabling cross-domain generalization.

Contribution

It presents a novel mastery-gated, sample-level reinforcement learning approach for chain-of-thought compression that adapts dynamically and generalizes across multiple domains and tasks.

Findings

01

Reduces response length by 20-40% with similar or better accuracy.

02

Enables models trained on math to generalize to code, instruction, and QA tasks.

03

Significantly decreases reasoning steps in tool-use agents, improving efficiency.

Abstract

Chain-of-thought reasoning in large language models can trigger an "overthinking trap": longer rollouts raise cost and latency yet often yield unreliable accuracy gains. Existing methods use global, static controls that may suppress needed reasoning. We propose mastery-gated, sample-level, soft reinforcement learning compression that penalizes long rollouts only when the model already solves the problem and has produced a shorter rollout. Across benchmarks, it cuts response length by 20-40% with comparable or higher accuracy and generalizes across domains: a model trained on math spontaneously shortens unseen tasks (code, instruction following, general-knowledge QA) without hurting accuracy. We further show two-way transfer between non-agent CoT and tool-use agents: non-agent training reduces SWE-Bench Verified rounds by 13%, while compressing a thinking agent cuts SWE trajectories by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Machine Learning and Algorithms