The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning
Zhijie Xie, Shenghui Song

TL;DR
This paper introduces FedRAC, a novel update order for PPO in federated reinforcement learning that improves convergence and performance by addressing data heterogeneity issues.
Contribution
It proposes reversing the PPO update order to actor first, critic second, and provides theoretical and empirical evidence of its benefits in federated RL.
Findings
FedRAC achieves higher rewards in multiple environments.
The algorithm converges faster than traditional PPO.
Performance remains robust under data heterogeneity.
Abstract
In the context of Federated Reinforcement Learning (FRL), applying Proximal Policy Optimization (PPO) faces challenges related to the update order of its actor and critic due to the aggregation step occurring between successive iterations. In particular, when local actors are updated based on local critic estimations, the algorithm becomes vulnerable to data heterogeneity. As a result, the conventional update order in PPO (critic first, then actor) may cause heterogeneous gradient directions among clients, hindering convergence to a globally optimal policy. To address this issue, we propose FedRAC, which reverses the update order (actor first, then critic) to eliminate the divergence of critics from different clients. Theoretical analysis shows that the convergence bound of FedRAC is immune to data heterogeneity under mild conditions, i.e., bounded level of heterogeneity and accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Reinforcement Learning in Robotics
