End-to-End Optimization of LLM-Driven Multi-Agent Search Systems via Heterogeneous-Group-Based Reinforcement Learning

Guanzhong Chen; Shaoxiong Yang; Chao Li; Wei Liu; Jian Luan; Zenglin Xu

arXiv:2506.02718·cs.LG·April 21, 2026

End-to-End Optimization of LLM-Driven Multi-Agent Search Systems via Heterogeneous-Group-Based Reinforcement Learning

Guanzhong Chen, Shaoxiong Yang, Chao Li, Wei Liu, Jian Luan, Zenglin Xu

PDF

TL;DR

This paper presents MHGPO, a novel multi-agent reinforcement learning method that improves the training of LLM-driven multi-agent systems by focusing on global success and efficiently capturing inter-agent dependencies.

Contribution

The paper introduces MHGPO, an end-to-end optimization approach that enhances multi-agent search systems by estimating advantages across heterogeneous groups, reducing training complexity.

Findings

01

MHGPO outperforms existing methods in task performance.

02

It improves computational efficiency in training multi-agent systems.

03

The approach effectively captures inter-agent dependencies.

Abstract

Large language models (LLMs) are versatile, yet their deployment in complex real-world settings is limited by static knowledge cutoffs and the difficulty of producing controllable behavior within a single inference. Multi-agent search systems (MASS), which coordinate specialized LLM agents equipped with search tools, mitigate these issues via task decomposition and retrieval-augmented problem solving. However, optimizing LLMs for agent-specific roles remains labor-intensive with prompt engineering or supervised fine-tuning, motivating automated end-to-end training. Existing multi-agent reinforcement learning (MARL) methods such as Multi-Agent Proximal Policy Optimization (MAPPO) typically depend on large critic networks to evaluate joint actions, leading to instability and high memory costs. We introduce Multi-Agent Heterogeneous Group Policy Optimization (MHGPO), which updates policies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.