On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments
Safwan Labbi, Paul Mangold, Daniil Tiapkin, Eric Moulines

TL;DR
This paper establishes convergence rates for federated softmax policy gradient methods, highlighting the impact of heterogeneity and introducing explicit constants for entropy-regularized cases.
Contribution
It provides the first convergence analysis for federated softmax policy gradient with entropy regularization, considering heterogeneity and non-convex objectives.
Findings
FedPG converges to a near-optimal policy with a heterogeneity-dependent gap.
First explicit convergence rates for entropy-regularized federated policy gradient.
Federated objectives may inherently require stochastic policies, unlike single-agent cases.
Abstract
We provide global convergence rates for vanilla and entropy-regularized federated softmax stochastic policy gradient (FedPG) with local training. We show that FedPG converges to a near-optimal policy in terms of the average agent value, with a gap controlled by the level of heterogeneity. Remarkably, we obtain the first convergence rates for entropy-regularized policy gradient with explicit constants, leveraging a projection-like operator. Our results build upon a new analysis of federated averaging for non-convex objectives, based on the observation that the {\L}ojasiewicz-type inequalities from the single-agent setting (Mei et al., 2020) do not hold for the federated objective. This uncovers a fundamental difference between single-agent and federated reinforcement learning: while single-agent optimal policies can be deterministic, federated objectives may inherently require stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
