Self-Compression of Chain-of-Thought via Multi-Agent Reinforcement Learning

Yiqun Chen; Jinyuan Feng; Wei Yang; Meizhi Zhong; Zhengliang Shi; Rui Li; Xiaochi Wei; Yan Gao; Yi Wu; Yao Hu; Zhiqiang Pu; Jiaxin Mao

arXiv:2601.21919·cs.AI·January 30, 2026

Self-Compression of Chain-of-Thought via Multi-Agent Reinforcement Learning

Yiqun Chen, Jinyuan Feng, Wei Yang, Meizhi Zhong, Zhengliang Shi, Rui Li, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu, Zhiqiang Pu, Jiaxin Mao

PDF

Open Access

TL;DR

This paper introduces a multi-agent reinforcement learning framework called SCMA that reduces redundant reasoning in large models, maintaining accuracy while decreasing inference length by up to 39%.

Contribution

It proposes a novel multi-agent RL approach with specialized agents for segmentation and scoring, effectively balancing brevity and reasoning accuracy.

Findings

01

Reduces response length by 11.1% to 39.0%.

02

Boosts reasoning accuracy by 4.33% to 10.02%.

03

Demonstrates emergent behaviors surpassing vanilla RL.

Abstract

The inference overhead induced by redundant reasoning undermines the interactive experience and severely bottlenecks the deployment of Large Reasoning Models. Existing reinforcement learning (RL)-based solutions tackle this problem by coupling a length penalty with outcome-based rewards. This simplistic reward weighting struggles to reconcile brevity with accuracy, as enforcing brevity may compromise critical reasoning logic. In this work, we address this limitation by proposing a multi-agent RL framework that selectively penalizes redundant chunks, while preserving essential reasoning logic. Our framework, Self-Compression via MARL (SCMA), instantiates redundancy detection and evaluation through two specialized agents: \textbf{a Segmentation Agent} for decomposing the reasoning process into logical chunks, and \textbf{a Scoring Agent} for quantifying the significance of each chunk. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Embodied and Extended Cognition