On the Convergence Theory of Meta Reinforcement Learning with   Personalized Policies

Haozhi Wang; Qing Wang; Yunfeng Shao; Dong Li; Jianye Hao; Yinchuan Li

arXiv:2209.10072·cs.AI·September 22, 2022

On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies

Haozhi Wang, Qing Wang, Yunfeng Shao, Dong Li, Jianye Hao, Yinchuan Li

PDF

Open Access

TL;DR

This paper introduces a personalized meta-reinforcement learning algorithm that addresses gradient conflicts by maintaining task-specific policies, with proven convergence and superior performance on benchmark control tasks.

Contribution

It proposes a novel pMeta-RL algorithm with theoretical convergence analysis and extends it to deep networks for continuous control, outperforming existing Meta-RL methods.

Findings

01

pMeta-RL converges under tabular setting.

02

The deep version improves performance on Gym and MuJoCo tasks.

03

Personalized policies enhance task-specific adaptation.

Abstract

Modern meta-reinforcement learning (Meta-RL) methods are mainly developed based on model-agnostic meta-learning, which performs policy gradient steps across tasks to maximize policy performance. However, the gradient conflict problem is still poorly understood in Meta-RL, which may lead to performance degradation when encountering distinct tasks. To tackle this challenge, this paper proposes a novel personalized Meta-RL (pMeta-RL) algorithm, which aggregates task-specific personalized policies to update a meta-policy used for all tasks, while maintaining personalized policies to maximize the average return of each task under the constraint of the meta-policy. We also provide the theoretical analysis under the tabular setting, which demonstrates the convergence of our pMeta-RL algorithm. Moreover, we extend the proposed pMeta-RL algorithm to a deep network version based on soft…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Fuel Cells and Related Materials