Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization

Seongmin Kim; Giseung Park; Woojun Kim; Jiwon Jeon; Seungyul Han; Youngchul Sung

arXiv:2603.02654·cs.MA·March 10, 2026

Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization

Seongmin Kim, Giseung Park, Woojun Kim, Jiwon Jeon, Seungyul Han, Youngchul Sung

PDF

Open Access

TL;DR

This paper introduces GPAE, a novel multi-agent reinforcement learning framework that improves sample efficiency and coordination by accurately estimating per-agent advantages using a stable, off-policy compatible method.

Contribution

The paper presents GPAE, a new advantage estimator employing a per-agent value iteration and importance sampling scheme, advancing multi-agent policy optimization.

Findings

01

Outperforms existing methods on benchmark tasks.

02

Enhances coordination in multi-agent scenarios.

03

Improves sample efficiency significantly.

Abstract

In this paper, we propose a novel framework for multi-agent reinforcement learning that enhances sample efficiency and coordination through accurate per-agent advantage estimation. The core of our approach is Generalized Per-Agent Advantage Estimator (GPAE), which employs a per-agent value iteration operator to compute precise per-agent advantages. This operator enables stable off-policy learning by indirectly estimating values via action probabilities, eliminating the need for direct Q-function estimation. To further refine estimation, we introduce a double-truncated importance sampling ratio scheme. This scheme improves credit assignment for off-policy trajectories by balancing sensitivity to the agent's own policy changes with robustness to non-stationarity from other agents. Experiments on benchmarks demonstrate that our approach outperforms existing approaches, excelling in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research