Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization
Xinchen Han, Hossam Afifi, Michel Marot, Xilu Wang, Lu Yin

TL;DR
This paper introduces FGO, a reinforcement learning method that compresses Chain-of-Thought reasoning in large language models by refining group responses, improving efficiency without performance loss.
Contribution
FGO is a novel RL algorithm that enhances group response refinement for effective CoT compression and addresses key limitations of previous methods like GRPO.
Findings
FGO achieves efficient CoT compression without performance degradation.
FGO outperforms previous methods on multiple reasoning benchmarks.
FGO resolves data utilization and entropy collapse issues of GRPO.
Abstract
Large Language Models (LLMs) often generate unnecessarily verbose Chain-of-Thought (CoT) reasoning that increases computational costs and latency without proportional performance gains. In this paper, we propose Fine-grained Group policy Optimization (FGO), a Reinforcement Learning (RL) algorithm that refines group responses by subdividing them and assigning appropriate weights based on length and entropy, thereby enabling effective CoT compression. Meanwhile, as an enhanced variant of Group Relative Policy Optimization (GRPO), FGO successfully addresses two major limitations of the GRPO: inefficient data utilization and entropy collapse. We evaluate FGO on multiple reasoning LLMs and benchmarks, including MATH500, AIME24, AMC23, and Minerva. Experimental results show that FGO achieves efficient CoT compression without degrading performance, and simultaneously resolves the key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Big Data and Digital Economy
