Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization

Yueyang Cang; Xiaoteng Zhang; Erlu Zhao; Zehua Ji; Yuhang Liu; Yuchen He; Zhiyuan Ning; Chen Yijun; Wenge Que; Li Shi

arXiv:2603.02701·cs.CL·March 4, 2026

Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization

Yueyang Cang, Xiaoteng Zhang, Erlu Zhao, Zehua Ji, Yuhang Liu, Yuchen He, Zhiyuan Ning, Chen Yijun, Wenge Que, Li Shi

PDF

Open Access 1 Models

TL;DR

Graph-GRPO introduces a group-based policy optimization method for multi-agent topology learning, reducing reward noise and improving stability and performance in large language model systems.

Contribution

It proposes a novel group relative policy optimization framework that enhances topology learning by mitigating reward variance and enabling precise credit assignment.

Findings

01

Outperforms state-of-the-art baselines in reasoning and code generation tasks.

02

Achieves more stable training and better identification of critical communication links.

03

Effectively reduces reward noise from task difficulty variance.

Abstract

Optimizing communication topology is fundamental to the efficiency and effectiveness of Large Language Model (LLM)-based Multi-Agent Systems (MAS). While recent approaches utilize reinforcement learning to dynamically construct task-specific graphs, they typically rely on single-sample policy gradients with absolute rewards (e.g., binary correctness). This paradigm suffers from severe gradient variance and the credit assignment problem: simple queries yield non-informative positive rewards for suboptimal structures, while difficult queries often result in failures that provide no learning signal. To address these challenges, we propose Graph-GRPO, a novel topology optimization framework that integrates Group Relative Policy Optimization. Instead of evaluating a single topology in isolation, Graph-GRPO samples a group of diverse communication graphs for each query and computes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
yannabadie/sage-topology-policy-v2
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Topic Modeling