The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning

Zhijie Xie; Shenghui Song

arXiv:2506.01261·cs.LG·June 3, 2025

The Actor-Critic Update Order Matters for PPO in Federated Reinforcement Learning

Zhijie Xie, Shenghui Song

PDF

Open Access

TL;DR

This paper introduces FedRAC, a novel update order for PPO in federated reinforcement learning that improves convergence and performance by addressing data heterogeneity issues.

Contribution

It proposes reversing the PPO update order to actor first, critic second, and provides theoretical and empirical evidence of its benefits in federated RL.

Findings

01

FedRAC achieves higher rewards in multiple environments.

02

The algorithm converges faster than traditional PPO.

03

Performance remains robust under data heterogeneity.

Abstract

In the context of Federated Reinforcement Learning (FRL), applying Proximal Policy Optimization (PPO) faces challenges related to the update order of its actor and critic due to the aggregation step occurring between successive iterations. In particular, when local actors are updated based on local critic estimations, the algorithm becomes vulnerable to data heterogeneity. As a result, the conventional update order in PPO (critic first, then actor) may cause heterogeneous gradient directions among clients, hindering convergence to a globally optimal policy. To address this issue, we propose FedRAC, which reverses the update order (actor first, then critic) to eliminate the divergence of critics from different clients. Theoretical analysis shows that the convergence bound of FedRAC is immune to data heterogeneity under mild conditions, i.e., bounded level of heterogeneity and accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Reinforcement Learning in Robotics