Decision-making with Speculative Opponent Models
Jing Sun, Shuo Chen, Cong Zhang, Yining Ma, Jie Zhang

TL;DR
This paper introduces DOMAC, a novel multi-agent reinforcement learning algorithm that models opponents using only local information, improving decision-making and performance in complex multi-agent environments.
Contribution
We propose DOMAC, the first speculative opponent modelling algorithm that relies solely on local information, with distributional critics and a derived policy gradient theorem.
Findings
DOMAC outperforms state-of-the-art methods in multiple benchmarks.
DOMAC achieves faster convergence in complex multi-agent tasks.
DOMAC effectively models opponent behaviors in challenging environments.
Abstract
Opponent modelling has proven effective in enhancing the decision-making of the controlled agent by constructing models of opponent agents. However, existing methods often rely on access to the observations and actions of opponents, a requirement that is infeasible when such information is either unobservable or challenging to obtain. To address this issue, we introduce Distributional Opponent-aided Multi-agent Actor-Critic (DOMAC), the first speculative opponent modelling algorithm that relies solely on local information (i.e., the controlled agent's observations, actions, and rewards). Specifically, the actor maintains a speculated belief about the opponents using the tailored speculative opponent models that predict the opponents' actions using only local information. Moreover, DOMAC features distributional critic models that estimate the return distribution of the actor's policy,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Anomaly Detection Techniques and Applications
