Towards an Understanding of Default Policies in Multitask Policy   Optimization

Ted Moskovitz; Michael Arbel; Jack Parker-Holder; Aldo Pacchiano

arXiv:2111.02994·cs.LG·March 24, 2022

Towards an Understanding of Default Policies in Multitask Policy Optimization

Ted Moskovitz, Michael Arbel, Jack Parker-Holder, Aldo Pacchiano

PDF

Open Access

TL;DR

This paper explores the role of default policies in multitask reinforcement learning, establishing theoretical links to optimization and proposing a new regularized policy optimization algorithm with performance guarantees.

Contribution

It provides the first formal analysis of default policies in multitask settings and introduces a novel RPO algorithm with theoretical performance guarantees.

Findings

01

The quality of default policies significantly impacts optimization in multitask RL.

02

A new RPO algorithm for multitask learning is proposed with strong theoretical guarantees.

03

The approach bridges the gap between theory and practice in multitask policy optimization.

Abstract

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical foundation for understanding RPO methods applied to single tasks, with connections to natural gradient, trust region, and variational approaches. However, there is limited formal understanding of desirable properties for default policies in the multitask setting, an increasingly important domain as the field shifts towards training more generally capable agents. Here, we take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning