Provably Efficient Imitation Learning from Observation Alone

Wen Sun; Anirudh Vemula; Byron Boots; J. Andrew Bagnell

arXiv:1905.10948·cs.LG·June 12, 2019·20 cites

Provably Efficient Imitation Learning from Observation Alone

Wen Sun, Anirudh Vemula, Byron Boots, J. Andrew Bagnell

PDF

Open Access 1 Repo

TL;DR

This paper introduces FAIL, a new model-free algorithm for imitation learning from observations alone, which is provably efficient and effective in large-scale MDPs, extending the scope of sample-efficient learning.

Contribution

The paper presents the first provably efficient algorithm for IL from observations, capable of learning near-optimal policies with polynomial sample complexity independent of observation space size.

Findings

01

FAIL learns near-optimal policies in large-scale MDPs.

02

The algorithm's sample complexity is polynomial and independent of the number of observations.

03

FAIL demonstrates strong empirical performance on OpenAI Gym tasks.

Abstract

We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wensun/Imitation-Learning-from-Observation
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research