Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic
Yufeng Zhang, Siyu Chen, Zhuoran Yang, Michael I. Jordan, Zhaoran Wang

TL;DR
This paper provides a mean-field theoretical analysis of neural actor-critic algorithms, showing their convergence to optimal policies and the evolution of feature representations in an overparameterized neural network setting.
Contribution
It introduces a mean-field framework for neural actor-critic algorithms, demonstrating convergence and feature evolution in the infinite-width, continuous-time limit.
Findings
Neural AC converges to the globally optimal policy at a sublinear rate.
Feature representations evolve within a neighborhood of the initial features.
The analysis applies to overparameterized two-layer neural networks with two-timescale updates.
Abstract
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years. However, most of the existing theoretical support for AC algorithms focuses on the case of linear function approximations, or linearized neural networks, where the feature representation is fixed throughout training. Such a limitation fails to capture the key aspect of representation learning in neural AC, which is pivotal in practical problems. In this work, we take a mean-field perspective on the evolution and convergence of feature-based neural AC. Specifically, we consider a version of AC where the actor and critic are represented by overparameterized two-layer neural networks and are updated with two-timescale learning rates. The critic is updated by temporal-difference (TD) learning with a larger stepsize while the actor is updated via proximal policy optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
