Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

Ariyan Bighashdel; Daan de Geus; Pavol Jancura; Gijs Dubbelman

arXiv:2304.01447·cs.MA·April 5, 2023·1 cites

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning

Ariyan Bighashdel, Daan de Geus, Pavol Jancura, Gijs Dubbelman

PDF

Open Access

TL;DR

This paper introduces Off-Policy Action Anticipation (OffPA2), a novel framework for learning anticipation in multi-agent reinforcement learning that improves efficiency and performance in large, non-differentiable games by focusing on action rather than policy parameter anticipation.

Contribution

The paper proposes OffPA2, a new off-policy action anticipation framework that extends higher-order gradient methods to non-differentiable, large state space games, overcoming limitations of existing policy parameter anticipation methods.

Findings

01

OffPA2 outperforms existing HOG methods in efficiency.

02

OffPA2 achieves better performance in large, non-differentiable games.

03

Theoretical analysis supports the effectiveness of OffPA2.

Abstract

Learning anticipation in Multi-Agent Reinforcement Learning (MARL) is a reasoning paradigm where agents anticipate the learning steps of other agents to improve cooperation among themselves. As MARL uses gradient-based optimization, learning anticipation requires using Higher-Order Gradients (HOG), with so-called HOG methods. Existing HOG methods are based on policy parameter anticipation, i.e., agents anticipate the changes in policy parameters of other agents. Currently, however, these existing HOG methods have only been applied to differentiable games or games with small state spaces. In this work, we demonstrate that in the case of non-differentiable games with large state spaces, existing HOG methods do not perform well and are inefficient due to their inherent limitations related to policy parameter anticipation and multiple sampling stages. To overcome these problems, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics