Provably Efficient Imitation Learning from Observation Alone
Wen Sun, Anirudh Vemula, Byron Boots, J. Andrew Bagnell

TL;DR
This paper introduces FAIL, a new model-free algorithm for imitation learning from observations alone, which is provably efficient and effective in large-scale MDPs, extending the scope of sample-efficient learning.
Contribution
The paper presents the first provably efficient algorithm for IL from observations, capable of learning near-optimal policies with polynomial sample complexity independent of observation space size.
Findings
FAIL learns near-optimal policies in large-scale MDPs.
The algorithm's sample complexity is polynomial and independent of the number of observations.
FAIL demonstrates strong empirical performance on OpenAI Gym tasks.
Abstract
We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
