Improved Policy Optimization for Online Imitation Learning

Jonathan Wilder Lavington; Sharan Vaswani; Mark Schmidt

arXiv:2208.00088·cs.LG·August 2, 2022

Improved Policy Optimization for Online Imitation Learning

Jonathan Wilder Lavington, Sharan Vaswani, Mark Schmidt

PDF

Open Access 1 Repo

TL;DR

This paper analyzes and improves policy optimization algorithms for online imitation learning, providing theoretical guarantees and practical variants that achieve constant regret under weaker assumptions, with demonstrated effectiveness in experiments.

Contribution

It offers new theoretical analysis of DAGGER with weaker loss assumptions and introduces regularized FTRL variants with constant regret guarantees for online imitation learning.

Findings

01

DAGGER achieves constant regret under weak convexity assumptions.

02

Regularized FTRL variants also achieve constant regret with expressive policies.

03

Experimental results confirm the effectiveness of proposed algorithms on control tasks.

Abstract

We consider online imitation learning (OIL), where the task is to find a policy that imitates the behavior of an expert via active interaction with the environment. We aim to bridge the gap between the theory and practice of policy optimization algorithms for OIL by analyzing one of the most popular OIL algorithms, DAGGER. Specifically, if the class of policies is sufficiently expressive to contain the expert policy, we prove that DAGGER achieves constant regret. Unlike previous bounds that require the losses to be strongly-convex, our result only requires the weaker assumption that the losses be strongly-convex with respect to the policy's sufficient statistics (not its parameterization). In order to ensure convergence for a wider class of policies and losses, we augment DAGGER with an additional regularization term. In particular, we propose a variant of Follow-the-Regularized-Leader…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wilderlavington/improved-policy-optimization-for-online-imitation-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications