Scalable Offline Reinforcement Learning for Mean Field Games
Axel Brunnbauer, Julian Lemmel, Zahra Babaiee, Sophie Neubauer, Radu, Grosu

TL;DR
This paper introduces Off-MMD, a scalable offline reinforcement learning algorithm for mean-field games that estimates equilibrium policies solely from static datasets, avoiding the need for online interactions or environment models.
Contribution
It proposes a novel offline mean-field RL method combining mirror descent and importance sampling, addressing data limitations and overestimation issues for practical multi-agent applications.
Findings
Off-MMD performs well on benchmark tasks like crowd exploration.
The algorithm is robust to low-quality datasets.
Sensitivity analysis shows stability across hyperparameters.
Abstract
Reinforcement learning algorithms for mean-field games offer a scalable framework for optimizing policies in large populations of interacting agents. Existing methods often depend on online interactions or access to system dynamics, limiting their practicality in real-world scenarios where such interactions are infeasible or difficult to model. In this paper, we present Offline Munchausen Mirror Descent (Off-MMD), a novel mean-field RL algorithm that approximates equilibrium policies in mean-field games using purely offline data. By leveraging iterative mirror descent and importance sampling techniques, Off-MMD estimates the mean-field distribution from static datasets without relying on simulation or environment dynamics. Additionally, we incorporate techniques from offline reinforcement learning to address common issues like Q-value overestimation, ensuring robust policy learning even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research
