A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
Pablo Hernandez-Leal, Michael Kaisers, Tim Baarslag, Enrique Munoz, de Cote

TL;DR
This survey reviews methods for handling non-stationarity in multiagent learning, categorizing approaches from ignoring to modeling opponents, and discusses their effectiveness across different environments.
Contribution
It introduces a unified framework and taxonomy for classifying algorithms addressing opponent-induced non-stationarity in multiagent systems.
Findings
Classified algorithms into five categories based on their approach to non-stationarity.
Provided a taxonomy considering environment characteristics and opponent behavior.
Highlighted strengths and limitations of each approach through illustrative examples.
Abstract
The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques
