Multi-agent Actor-Critic with Time Dynamical Opponent Model
Yuan Tian, Klaus-Rudolf Kladny, Qin Wang, Zhiwu Huang, Olga Fink

TL;DR
This paper introduces a novel Time Dynamical Opponent Model (TDOM) for multi-agent reinforcement learning, improving opponent prediction and training stability, especially in mixed cooperative-competitive environments.
Contribution
The paper proposes TDOM, a new opponent modeling approach that leverages the tendency of agents' policies to improve over time, and integrates it into an Actor-Critic framework for enhanced performance.
Findings
TDOM-AC outperforms state-of-the-art methods in various environments.
TDOM provides superior opponent behavior prediction during testing.
The approach results in more stable training and faster convergence.
Abstract
In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other. Since the agents adapt their policies during learning, not only the behavior of a single agent becomes non-stationary, but also the environment as perceived by the agent. This renders it particularly challenging to perform policy improvement. In this paper, we propose to exploit the fact that the agents seek to improve their expected cumulative reward and introduce a novel \textit{Time Dynamical Opponent Model} (TDOM) to encode the knowledge that the opponent policies tend to improve over time. We motivate TDOM theoretically by deriving a lower bound of the log objective of an individual agent and further propose \textit{Multi-Agent Actor-Critic with Time Dynamical Opponent Model} (TDOM-AC). We evaluate the proposed TDOM-AC on a differential game and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
