Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path
Muhammad Aneeq uz Zaman, Alec Koppel, Sujay Bhatt, Tamer Ba\c{s}ar

TL;DR
This paper introduces Sandbox Learning, an oracle-free reinforcement learning algorithm for mean-field games that uses a single sample path to approximate the equilibrium, providing convergence guarantees and practical effectiveness.
Contribution
It develops a novel two-time-scale algorithm for MFGs that avoids the need for a mean-field oracle, with proven finite-sample convergence and broad applicability.
Findings
Finite sample convergence guarantees for the algorithm.
Sample complexity of 4(\u03b5^{-4}) for MFE approximation.
Empirical validation in diverse scenarios, including non-communicating MDPs.
Abstract
We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike traditional approaches, we alleviate the need for a mean-field oracle by developing an algorithm that approximates the Mean-Field Equilibrium (MFE) using the single sample path of the generic agent. We call this {\it Sandbox Learning}, as it can be used as a warm-start for any agent learning in a multi-agent non-cooperative setting. We adopt a two time-scale approach in which an online fixed-point recursion for the mean-field operates on a slower time-scale, in tandem with a control policy update on a faster time-scale for the generic agent. Given that the underlying Markov Decision Process (MDP) of the agent is communicating, we provide finite sample convergence guarantees in terms of convergence of the mean-field and control policy to the mean-field equilibrium. The sample complexity of the Sandbox learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Game Theory and Applications · Advanced Bandit Algorithms Research
