Analysis of Multiscale Reinforcement Q-Learning Algorithms for Mean   Field Control Games

Andrea Angiuli; Jean-Pierre Fouque; Mathieu Lauri\`ere; Mengrui Zhang

arXiv:2405.17017·math.OC·June 5, 2024

Analysis of Multiscale Reinforcement Q-Learning Algorithms for Mean Field Control Games

Andrea Angiuli, Jean-Pierre Fouque, Mathieu Lauri\`ere, Mengrui Zhang

PDF

Open Access

TL;DR

This paper proves the convergence of a three-timescale Reinforcement Q-Learning algorithm for Mean Field Control Games, addressing the complexity of multiple population distributions in a model-free setting.

Contribution

It introduces a novel three-timescale RL algorithm for MFCG and provides a convergence proof, extending previous two-timescale analyses to more complex multi-population scenarios.

Findings

01

Convergence of the three-timescale RL algorithm is established.

02

The algorithm effectively handles multiple population distributions.

03

A simple example demonstrates the algorithm's performance and theoretical assumptions.

Abstract

Mean Field Control Games (MFCG), introduced in [Angiuli et al., 2022a], represent competitive games between a large number of large collaborative groups of agents in the infinite limit of number and size of groups. In this paper, we prove the convergence of a three-timescale Reinforcement Q-Learning (RL) algorithm to solve MFCG in a model-free approach from the point of view of representative agents. Our analysis uses a Q-table for finite state and action spaces updated at each discrete time-step over an infinite horizon. In [Angiuli et al., 2023], we proved convergence of two-timescale algorithms for MFG and MFC separately highlighting the need to follow multiple population distributions in the MFC case. Here, we integrate this feature for MFCG as well as three rates of update decreasing to zero in the proper ratios. Our technique of proof uses a generalization to three timescales of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control

MethodsQ-Learning