Improved Policy Optimization for Online Imitation Learning
Jonathan Wilder Lavington, Sharan Vaswani, Mark Schmidt

TL;DR
This paper analyzes and improves policy optimization algorithms for online imitation learning, providing theoretical guarantees and practical variants that achieve constant regret under weaker assumptions, with demonstrated effectiveness in experiments.
Contribution
It offers new theoretical analysis of DAGGER with weaker loss assumptions and introduces regularized FTRL variants with constant regret guarantees for online imitation learning.
Findings
DAGGER achieves constant regret under weak convexity assumptions.
Regularized FTRL variants also achieve constant regret with expressive policies.
Experimental results confirm the effectiveness of proposed algorithms on control tasks.
Abstract
We consider online imitation learning (OIL), where the task is to find a policy that imitates the behavior of an expert via active interaction with the environment. We aim to bridge the gap between the theory and practice of policy optimization algorithms for OIL by analyzing one of the most popular OIL algorithms, DAGGER. Specifically, if the class of policies is sufficiently expressive to contain the expert policy, we prove that DAGGER achieves constant regret. Unlike previous bounds that require the losses to be strongly-convex, our result only requires the weaker assumption that the losses be strongly-convex with respect to the policy's sufficient statistics (not its parameterization). In order to ensure convergence for a wider class of policies and losses, we augment DAGGER with an additional regularization term. In particular, we propose a variant of Follow-the-Regularized-Leader…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications
