Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning
Ren\'e Carmona, Mathieu Lauri\`ere, Zongjun Tan

TL;DR
This paper introduces a framework for mean-field reinforcement learning using Mean Field Markov Decision Processes, establishing connections between control policies and RL methods, with convergence guarantees and numerical demonstrations.
Contribution
It develops a novel mean-field RL framework connecting MFC and MFMDP, proposing adapted RL algorithms with convergence guarantees for population-based control problems.
Findings
Existence of optimal closed-loop policies for MFC.
Development of RL algorithms for mean-field settings.
Numerical validation of the proposed methods.
Abstract
We study infinite horizon discounted Mean Field Control (MFC) problems with common noise through the lens of Mean Field Markov Decision Processes (MFMDP). We allow the agents to use actions that are randomized not only at the individual level but also at the level of the population. This common randomization allows us to establish connections between both closed-loop and open-loop policies for MFC and Markov policies for the MFMDP. In particular, we show that there exists an optimal closed-loop policy for the original MFC. Building on this framework and the notion of state-action value function, we then propose reinforcement learning (RL) methods for such problems, by adapting existing tabular and deep RL methods to the mean-field setting. The main difficulty is the treatment of the population state, which is an input of the policy and the value function. We provide convergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Advanced Control Systems Optimization · Reinforcement Learning in Robotics
MethodsQ-Learning
