Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning
Ariyan Bighashdel, Daan de Geus, Pavol Jancura, Gijs Dubbelman

TL;DR
This paper introduces Off-Policy Action Anticipation (OffPA2), a novel framework for learning anticipation in multi-agent reinforcement learning that improves efficiency and performance in large, non-differentiable games by focusing on action rather than policy parameter anticipation.
Contribution
The paper proposes OffPA2, a new off-policy action anticipation framework that extends higher-order gradient methods to non-differentiable, large state space games, overcoming limitations of existing policy parameter anticipation methods.
Findings
OffPA2 outperforms existing HOG methods in efficiency.
OffPA2 achieves better performance in large, non-differentiable games.
Theoretical analysis supports the effectiveness of OffPA2.
Abstract
Learning anticipation in Multi-Agent Reinforcement Learning (MARL) is a reasoning paradigm where agents anticipate the learning steps of other agents to improve cooperation among themselves. As MARL uses gradient-based optimization, learning anticipation requires using Higher-Order Gradients (HOG), with so-called HOG methods. Existing HOG methods are based on policy parameter anticipation, i.e., agents anticipate the changes in policy parameters of other agents. Currently, however, these existing HOG methods have only been applied to differentiable games or games with small state spaces. In this work, we demonstrate that in the case of non-differentiable games with large state spaces, existing HOG methods do not perform well and are inefficient due to their inherent limitations related to policy parameter anticipation and multiple sampling stages. To overcome these problems, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
