Accelerating Imitation Learning with Predictive Models

Ching-An Cheng; Xinyan Yan; Evangelos A. Theodorou; Byron; Boots

arXiv:1806.04642·cs.LG·October 16, 2018

Accelerating Imitation Learning with Predictive Models

Ching-An Cheng, Xinyan Yan, Evangelos A. Theodorou, Byron, Boots

PDF

TL;DR

This paper introduces two model-based algorithms that use predictive models to significantly accelerate the convergence of online imitation learning, improving sample efficiency in reinforcement learning tasks.

Contribution

The paper proposes MoBIL-VI and MoBIL-Prox algorithms that leverage learned predictive models to speed up online imitation learning convergence rates.

Findings

01

Algorithms achieve provable acceleration of convergence rates.

02

Model-based methods improve sample efficiency.

03

The approach generalizes stochastic Mirror-Prox.

Abstract

Sample efficiency is critical in solving real-world reinforcement learning problems, where agent-environment interactions can be costly. Imitation learning from expert advice has proved to be an effective strategy for reducing the number of interactions required to train a policy. Online imitation learning, which interleaves policy evaluation and policy optimization, is a particularly effective technique with provable performance guarantees. In this work, we seek to further accelerate the convergence rate of online imitation learning, thereby making it more sample efficient. We propose two model-based algorithms inspired by Follow-the-Leader (FTL) with prediction: MoBIL-VI based on solving variational inequalities and MoBIL-Prox based on stochastic first-order updates. These two methods leverage a model to predict future gradients to speed up policy learning. When the model oracle is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings