Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games
Muhammad Aneeq uz Zaman, Kaiqing Zhang, Erik Miehling, and Tamer, Ba\c{s}ar

TL;DR
This paper develops an actor-critic reinforcement learning algorithm for non-stationary, infinite-horizon linear-quadratic mean-field games with large populations, providing convergence guarantees and error bounds.
Contribution
It introduces a novel RL algorithm that handles non-stationarity and non-causal equations in mean-field games, with finite-sample convergence analysis.
Findings
Algorithm converges with finite samples
Samples can be drawn from unmixed Markov chains
Error bounds relate to Nash equilibrium
Abstract
In this paper, we study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs). Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon. We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG. There are two primary challenges: i) the non-stationarity of the MFG induces a linear-quadratic tracking problem, which requires solving a backwards-in-time (non-causal) equation that cannot be solved by standard (causal) RL algorithms; ii) Many RL algorithms assume that the states are sampled from the stationary distribution of a Markov chain (MC), that is, the chain is already mixed, an assumption that is not satisfied for real data sources. We first identify that the mean-field trajectory follows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
