Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential   Prediction

Wen Sun; Arun Venkatraman; Geoffrey J. Gordon; Byron Boots; J. Andrew; Bagnell

arXiv:1703.01030·cs.LG·March 6, 2017·85 cites

Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew, Bagnell

PDF

Open Access

TL;DR

Deeply AggreVaTeD introduces a differentiable imitation learning method that leverages oracles to outperform reinforcement learning in sequential decision tasks, with faster training and better performance.

Contribution

It extends the AggreVaTeD algorithm to differentiable policies, providing both empirical results and theoretical analysis showing reduced sample complexity compared to RL.

Findings

01

Achieves faster and better solutions than RL using less data.

02

Demonstrates superior performance even with sub-optimal demonstrators.

03

Provides theoretical proof of exponential sample complexity reduction.

Abstract

Researchers have demonstrated state-of-the-art performance in sequential decision making problems (e.g., robotics control, sequential prediction) with deep neural network models. One often has access to near-optimal oracles that achieve good performance on the task during training. We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of (Ross & Bagnell, 2014) --- can leverage such an oracle to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique. Using both feedforward and recurrent neural network predictors, we present stochastic gradient procedures on a sequential prediction task, dependency-parsing from raw image data, as well as on various high dimensional robotics control problems. We also provide a comprehensive theoretical study of IL that demonstrates we can expect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning