On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments

Safwan Labbi; Paul Mangold; Daniil Tiapkin; Eric Moulines

arXiv:2505.23459·cs.LG·April 2, 2026

On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments

Safwan Labbi, Paul Mangold, Daniil Tiapkin, Eric Moulines

PDF

TL;DR

This paper establishes convergence rates for federated softmax policy gradient methods, highlighting the impact of heterogeneity and introducing explicit constants for entropy-regularized cases.

Contribution

It provides the first convergence analysis for federated softmax policy gradient with entropy regularization, considering heterogeneity and non-convex objectives.

Findings

01

FedPG converges to a near-optimal policy with a heterogeneity-dependent gap.

02

First explicit convergence rates for entropy-regularized federated policy gradient.

03

Federated objectives may inherently require stochastic policies, unlike single-agent cases.

Abstract

We provide global convergence rates for vanilla and entropy-regularized federated softmax stochastic policy gradient (FedPG) with local training. We show that FedPG converges to a near-optimal policy in terms of the average agent value, with a gap controlled by the level of heterogeneity. Remarkably, we obtain the first convergence rates for entropy-regularized policy gradient with explicit constants, leveraging a projection-like operator. Our results build upon a new analysis of federated averaging for non-convex objectives, based on the observation that the {\L}ojasiewicz-type inequalities from the single-agent setting (Mei et al., 2020) do not hold for the federated objective. This uncovers a fundamental difference between single-agent and federated reinforcement learning: while single-agent optimal policies can be deterministic, federated objectives may inherently require stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.