Explaining Fast Improvement in Online Imitation Learning

Xinyan Yan; Byron Boots; Ching-An Cheng

arXiv:2007.02520·cs.LG·February 23, 2021

Explaining Fast Improvement in Online Imitation Learning

Xinyan Yan, Byron Boots, Ching-An Cheng

PDF

Open Access

TL;DR

This paper explains why online imitation learning often improves policies faster than theory predicts, showing that expressive policy classes enhance both speed and accuracy of learning.

Contribution

The paper provides a theoretical analysis demonstrating that expressive policy classes accelerate policy improvement in online imitation learning.

Findings

01

Policy improvement speed is $ ilde{O}(1/N + \sqrt{\xi/N})$ after N rounds.

02

Expressive policy classes reduce bias and increase learning speed.

03

Theoretical results align with empirical observations of fast policy improvement.

Abstract

Online imitation learning (IL) is an algorithmic framework that leverages interactions with expert policies for efficient policy optimization. Here policies are optimized by performing online learning on a sequence of loss functions that encourage the learner to mimic expert actions, and if the online learning has no regret, the agent can provably learn an expert-like policy. Online IL has demonstrated empirical successes in many applications and interestingly, its policy improvement speed observed in practice is usually much faster than existing theory suggests. In this work, we provide an explanation of this phenomenon. Let $ξ$ denote the policy class bias and assume the online IL loss functions are convex, smooth, and non-negative. We prove that, after $N$ rounds of online IL with stochastic feedback, the policy improves in $\tilde{O} (1/ N + ξ / N)$ in both expectation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings