Explaining Fast Improvement in Online Imitation Learning
Xinyan Yan, Byron Boots, Ching-An Cheng

TL;DR
This paper explains why online imitation learning often improves policies faster than theory predicts, showing that expressive policy classes enhance both speed and accuracy of learning.
Contribution
The paper provides a theoretical analysis demonstrating that expressive policy classes accelerate policy improvement in online imitation learning.
Findings
Policy improvement speed is $ ilde{O}(1/N + \sqrt{\xi/N})$ after N rounds.
Expressive policy classes reduce bias and increase learning speed.
Theoretical results align with empirical observations of fast policy improvement.
Abstract
Online imitation learning (IL) is an algorithmic framework that leverages interactions with expert policies for efficient policy optimization. Here policies are optimized by performing online learning on a sequence of loss functions that encourage the learner to mimic expert actions, and if the online learning has no regret, the agent can provably learn an expert-like policy. Online IL has demonstrated empirical successes in many applications and interestingly, its policy improvement speed observed in practice is usually much faster than existing theory suggests. In this work, we provide an explanation of this phenomenon. Let denote the policy class bias and assume the online IL loss functions are convex, smooth, and non-negative. We prove that, after rounds of online IL with stochastic feedback, the policy improves in in both expectation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
