Multi-agent Actor-Critic with Time Dynamical Opponent Model

Yuan Tian; Klaus-Rudolf Kladny; Qin Wang; Zhiwu Huang; Olga Fink

arXiv:2204.05576·cs.AI·April 13, 2022

Multi-agent Actor-Critic with Time Dynamical Opponent Model

Yuan Tian, Klaus-Rudolf Kladny, Qin Wang, Zhiwu Huang, Olga Fink

PDF

Open Access

TL;DR

This paper introduces a novel Time Dynamical Opponent Model (TDOM) for multi-agent reinforcement learning, improving opponent prediction and training stability, especially in mixed cooperative-competitive environments.

Contribution

The paper proposes TDOM, a new opponent modeling approach that leverages the tendency of agents' policies to improve over time, and integrates it into an Actor-Critic framework for enhanced performance.

Findings

01

TDOM-AC outperforms state-of-the-art methods in various environments.

02

TDOM provides superior opponent behavior prediction during testing.

03

The approach results in more stable training and faster convergence.

Abstract

In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other. Since the agents adapt their policies during learning, not only the behavior of a single agent becomes non-stationary, but also the environment as perceived by the agent. This renders it particularly challenging to perform policy improvement. In this paper, we propose to exploit the fact that the agents seek to improve their expected cumulative reward and introduce a novel \textit{Time Dynamical Opponent Model} (TDOM) to encode the knowledge that the opponent policies tend to improve over time. We motivate TDOM theoretically by deriving a lower bound of the log objective of an individual agent and further propose \textit{Multi-Agent Actor-Critic with Time Dynamical Opponent Model} (TDOM-AC). We evaluate the proposed TDOM-AC on a differential game and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics