Natural Policy Gradient and Actor Critic Methods for Constrained   Multi-Task Reinforcement Learning

Sihan Zeng; Thinh T. Doan; Justin Romberg

arXiv:2405.02456·math.OC·May 7, 2024

Natural Policy Gradient and Actor Critic Methods for Constrained Multi-Task Reinforcement Learning

Sihan Zeng, Thinh T. Doan, Justin Romberg

PDF

Open Access

TL;DR

This paper introduces a constrained multi-task reinforcement learning framework and develops primal-dual and actor-critic algorithms for both centralized and decentralized settings, with convergence guarantees and extensions to function approximation.

Contribution

It proposes a novel constrained formulation for multi-task RL and develops algorithms with convergence guarantees, including extensions to function approximation.

Findings

01

The primal-dual algorithm converges to the global optimum with exact gradients.

02

The actor-critic algorithm effectively finds optimal policies using online samples.

03

Extensions to linear function approximation are successfully demonstrated.

Abstract

Multi-task reinforcement learning (RL) aims to find a single policy that effectively solves multiple tasks at the same time. This paper presents a constrained formulation for multi-task RL where the goal is to maximize the average performance of the policy across tasks subject to bounds on the performance in each task. We consider solving this problem both in the centralized setting, where information for all tasks is accessible to a single server, and in the decentralized setting, where a network of agents, each given one task and observing local information, cooperate to find the solution of the globally constrained objective using local communication. We first propose a primal-dual algorithm that provably converges to the globally optimal solution of this constrained formulation under exact gradient evaluations. When the gradient is unknown, we further develop a sampled-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics