Analysis of Multiscale Reinforcement Q-Learning Algorithms for Mean Field Control Games
Andrea Angiuli, Jean-Pierre Fouque, Mathieu Lauri\`ere, Mengrui Zhang

TL;DR
This paper proves the convergence of a three-timescale Reinforcement Q-Learning algorithm for Mean Field Control Games, addressing the complexity of multiple population distributions in a model-free setting.
Contribution
It introduces a novel three-timescale RL algorithm for MFCG and provides a convergence proof, extending previous two-timescale analyses to more complex multi-population scenarios.
Findings
Convergence of the three-timescale RL algorithm is established.
The algorithm effectively handles multiple population distributions.
A simple example demonstrates the algorithm's performance and theoretical assumptions.
Abstract
Mean Field Control Games (MFCG), introduced in [Angiuli et al., 2022a], represent competitive games between a large number of large collaborative groups of agents in the infinite limit of number and size of groups. In this paper, we prove the convergence of a three-timescale Reinforcement Q-Learning (RL) algorithm to solve MFCG in a model-free approach from the point of view of representative agents. Our analysis uses a Q-table for finite state and action spaces updated at each discrete time-step over an infinite horizon. In [Angiuli et al., 2023], we proved convergence of two-timescale algorithms for MFG and MFC separately highlighting the need to follow multiple population distributions in the MFC case. Here, we integrate this feature for MFCG as well as three rates of update decreasing to zero in the proper ratios. Our technique of proof uses a generalization to three timescales of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control
MethodsQ-Learning
