Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning
Tong Yang, Shicong Cen, Yuting Wei, Yuxin Chen, Yuejie Chi

TL;DR
This paper introduces federated natural policy gradient and actor critic methods for multi-task reinforcement learning, achieving near dimension-free convergence guarantees in decentralized settings with multiple agents sharing a transition kernel but private rewards.
Contribution
It develops the first federated multi-task RL algorithms with provable convergence, extending to function approximation and addressing communication constraints among agents.
Findings
Non-asymptotic convergence guarantees established
Nearly independent of state-action space size
Finite-time sample complexity for function approximation
Abstract
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment. Focusing on infinite-horizon Markov decision processes, the goal is to learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner, where each agent only communicates with its neighbors over some prescribed graph topology. We develop federated vanilla and entropy-regularized natural policy gradient (NPG) methods in the tabular setting under softmax parameterization, where gradient tracking is applied to estimate the global Q-function to mitigate the impact of imperfect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAge of Information Optimization · Electric Vehicles and Infrastructure
MethodsSoftmax
