Multi-Agent Trust Region Policy Optimisation: A Joint Constraint Approach

Chak Lam Shek; Guangyao Shi; Pratap Tokekar

arXiv:2508.10340·cs.AI·August 15, 2025

Multi-Agent Trust Region Policy Optimisation: A Joint Constraint Approach

Chak Lam Shek, Guangyao Shi, Pratap Tokekar

PDF

TL;DR

This paper introduces two novel methods for adaptive trust region constraint allocation in multi-agent reinforcement learning, significantly improving convergence speed and performance stability in heterogeneous settings.

Contribution

It proposes KKT-based and greedy algorithms for dynamic KL threshold assignment, enhancing the effectiveness of trust region policy optimization in multi-agent systems.

Findings

01

Both methods outperform baseline HATRPO in convergence speed.

02

Achieve over 22.5% improvement in final rewards.

03

HATRPO-W shows more stable learning with lower variance.

Abstract

Multi-agent reinforcement learning (MARL) requires coordinated and stable policy updates among interacting agents. Heterogeneous-Agent Trust Region Policy Optimization (HATRPO) enforces per-agent trust region constraints using Kullback-Leibler (KL) divergence to stabilize training. However, assigning each agent the same KL threshold can lead to slow and locally optimal updates, especially in heterogeneous settings. To address this limitation, we propose two approaches for allocating the KL divergence threshold across agents: HATRPO-W, a Karush-Kuhn-Tucker-based (KKT-based) method that optimizes threshold assignment under global KL constraints, and HATRPO-G, a greedy algorithm that prioritizes agents based on improvement-to-divergence ratio. By connecting sequential policy optimization with constrained threshold scheduling, our approach enables more flexible and effective learning in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.