Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory
Stefanos Leonardos, Georgios Piliouras

TL;DR
This paper investigates how exploration-exploitation dynamics in multi-agent learning can cause abrupt changes in system behavior, using a smooth Q-learning model and catastrophe theory to understand equilibrium shifts and their impact on performance.
Contribution
It introduces a smooth Q-learning framework with theoretical guarantees and links exploration hyperparameter tuning to bifurcations in equilibrium stability in multi-agent systems.
Findings
Bounded regret in arbitrary games.
Convergence to quantal-response equilibria in potential games.
Phase transitions in equilibria as exploration varies.
Abstract
Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL), however, its effects are far from understood. To make progress in this direction, we study a smooth analogue of Q-learning. We start by showing that our learning model has strong theoretical justification as an optimal model for studying exploration-exploitation. Specifically, we prove that smooth Q-learning has bounded regret in arbitrary games for a cost model that explicitly captures the balance between game and exploration costs and that it always converges to the set of quantal-response equilibria (QRE), the standard solution concept for games under bounded rationality, in weighted potential games with heterogeneous learning agents. In our main task, we then turn to measure the effect of exploration in collective system performance. We characterize the geometry of the QRE surface in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExperimental Behavioral Economics Studies · Game Theory and Applications · Evolutionary Game Theory and Cooperation
