Mean-Field Reinforcement Learning without Synchrony
Shan Yang

TL;DR
This paper introduces the Temporal Mean Field (TMF) framework for multi-agent reinforcement learning that remains effective under asynchrony by using the population distribution as a summary statistic, extending mean-field theory.
Contribution
The paper develops a new TMF framework based on the population distribution, proving equilibrium existence, finite-population approximation, and convergence of a policy gradient method.
Findings
TMF-PG achieves near-identical performance regardless of agents' act frequency.
Approximation error decays at the rate of O(1/√N).
The framework unifies synchronous and asynchronous decision-making in mean-field RL.
Abstract
Mean-field reinforcement learning (MF-RL) scales multi-agent RL to large populations by reducing each agent's dependence on others to a single summary statistic -- the mean action. However, this reduction requires every agent to act at every time step; when some agents are idle, the mean action is simply undefined. Addressing asynchrony therefore requires a different summary statistic -- one that remains defined regardless of which agents act. The population distribution -- the fraction of agents at each observation -- satisfies this requirement: its dimension is independent of , and under exchangeability it fully determines each agent's reward and transition. Existing MF-RL theory, however, is built on the mean action and does not extend to . We therefore construct the Temporal Mean Field (TMF) framework around the population distribution …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Age of Information Optimization · Game Theory and Applications
