Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization
Yijia Fan, Jusheng Zhang, Jing Yang, Keze Wang

TL;DR
Agent-GSPO is a novel framework that reduces communication costs in multi-agent systems by optimizing token usage through sequence-level reinforcement learning, achieving state-of-the-art results with less verbosity.
Contribution
It introduces Group Sequence Policy Optimization (GSPO) for efficient, communication-aware training of multi-agent systems, emphasizing token economy and emergent strategies.
Findings
Achieves state-of-the-art performance on seven reasoning benchmarks.
Reduces token consumption significantly compared to existing methods.
Develops emergent strategies like 'strategic silence' for communication efficiency.
Abstract
To combat the prohibitive communication costs of ``free-for-all" multi-agent systems (MAS), we introduce \textbf{Agent-GSPO}, a framework that directly optimizes for token economy using sequence-level reinforcement learning. Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy Optimization (GSPO) algorithm to train agents on a communication-aware reward that explicitly penalizes verbosity. Across seven reasoning benchmarks, Agent-GSPO not only achieves new state-of-the-art performance but does so with a fraction of the token consumption of existing methods. By fostering emergent strategies like ``strategic silence," our approach provides a practical blueprint for developing scalable and economically viable multi-agent systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
