Federated Natural Policy Gradient and Actor Critic Methods for   Multi-task Reinforcement Learning

Tong Yang; Shicong Cen; Yuting Wei; Yuxin Chen; Yuejie Chi

arXiv:2311.00201·cs.LG·August 19, 2024·1 cites

Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

Tong Yang, Shicong Cen, Yuting Wei, Yuxin Chen, Yuejie Chi

PDF

Open Access 1 Video

TL;DR

This paper introduces federated natural policy gradient and actor critic methods for multi-task reinforcement learning, achieving near dimension-free convergence guarantees in decentralized settings with multiple agents sharing a transition kernel but private rewards.

Contribution

It develops the first federated multi-task RL algorithms with provable convergence, extending to function approximation and addressing communication constraints among agents.

Findings

01

Non-asymptotic convergence guarantees established

02

Nearly independent of state-action space size

03

Finite-time sample complexity for function approximation

Abstract

Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment. Focusing on infinite-horizon Markov decision processes, the goal is to learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner, where each agent only communicates with its neighbors over some prescribed graph topology. We develop federated vanilla and entropy-regularized natural policy gradient (NPG) methods in the tabular setting under softmax parameterization, where gradient tracking is applied to estimate the global Q-function to mitigate the impact of imperfect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning· slideslive

Taxonomy

TopicsAge of Information Optimization · Electric Vehicles and Infrastructure

MethodsSoftmax