Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

Yijia Fan; Jusheng Zhang; Jing Yang; Keze Wang

arXiv:2510.22477·cs.MA·October 28, 2025

Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization

Yijia Fan, Jusheng Zhang, Jing Yang, Keze Wang

PDF

TL;DR

Agent-GSPO is a novel framework that reduces communication costs in multi-agent systems by optimizing token usage through sequence-level reinforcement learning, achieving state-of-the-art results with less verbosity.

Contribution

It introduces Group Sequence Policy Optimization (GSPO) for efficient, communication-aware training of multi-agent systems, emphasizing token economy and emergent strategies.

Findings

01

Achieves state-of-the-art performance on seven reasoning benchmarks.

02

Reduces token consumption significantly compared to existing methods.

03

Develops emergent strategies like 'strategic silence' for communication efficiency.

Abstract

To combat the prohibitive communication costs of ``free-for-all" multi-agent systems (MAS), we introduce \textbf{Agent-GSPO}, a framework that directly optimizes for token economy using sequence-level reinforcement learning. Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy Optimization (GSPO) algorithm to train agents on a communication-aware reward that explicitly penalizes verbosity. Across seven reasoning benchmarks, Agent-GSPO not only achieves new state-of-the-art performance but does so with a fraction of the token consumption of existing methods. By fostering emergent strategies like ``strategic silence," our approach provides a practical blueprint for developing scalable and economically viable multi-agent systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.