Learning to Model Opponent Learning
Ian Davies, Zheng Tian, Jun Wang

TL;DR
This paper introduces LeMOL, a novel method for modeling the learning dynamics of opponents in multi-agent reinforcement learning, addressing non-stationarity and improving agent performance.
Contribution
LeMOL is a new structured opponent modeling approach that captures opponent learning dynamics, surpassing naive behavior cloning in accuracy and stability.
Findings
LeMOL outperforms behavior cloning baselines in modeling accuracy.
Opponent modeling with LeMOL enhances multi-agent algorithmic performance.
Structured opponent models better handle non-stationarity in MARL environments.
Abstract
Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. The adaptation and learning of other agents induces non-stationarity in the environment dynamics. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. Policy search algorithms also struggle in multi-agent settings as the partial observability resulting from an opponent's actions not being known introduces high variance to policy training. Modelling an agent's opponent(s) is often pursued as a means of resolving the issues arising from the coexistence of learning opponents. An opponent model provides an agent with some ability to reason about other agents to aid its own decision making. Most prior works learn an opponent model by assuming the opponent is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Domain Adaptation and Few-Shot Learning
