Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization
Seongmin Kim, Giseung Park, Woojun Kim, Jiwon Jeon, Seungyul Han, Youngchul Sung

TL;DR
This paper introduces GPAE, a novel multi-agent reinforcement learning framework that improves sample efficiency and coordination by accurately estimating per-agent advantages using a stable, off-policy compatible method.
Contribution
The paper presents GPAE, a new advantage estimator employing a per-agent value iteration and importance sampling scheme, advancing multi-agent policy optimization.
Findings
Outperforms existing methods on benchmark tasks.
Enhances coordination in multi-agent scenarios.
Improves sample efficiency significantly.
Abstract
In this paper, we propose a novel framework for multi-agent reinforcement learning that enhances sample efficiency and coordination through accurate per-agent advantage estimation. The core of our approach is Generalized Per-Agent Advantage Estimator (GPAE), which employs a per-agent value iteration operator to compute precise per-agent advantages. This operator enables stable off-policy learning by indirectly estimating values via action probabilities, eliminating the need for direct Q-function estimation. To further refine estimation, we introduce a double-truncated importance sampling ratio scheme. This scheme improves credit assignment for off-policy trajectories by balancing sensitivity to the agent's own policy changes with robustness to non-stationarity from other agents. Experiments on benchmarks demonstrate that our approach outperforms existing approaches, excelling in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research
