Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis
Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo

TL;DR
This paper provides a theoretical analysis of adversarial imitation learning, explaining its strong performance with limited expert data and long planning horizons through a novel stage-coupled analysis.
Contribution
It introduces a horizon-free imitation gap bound for TV-AIL, clarifying why AIL performs well with few trajectories and long horizons.
Findings
Imitation gap bound is at most 1 regardless of horizon.
Bound is meaningful in small and large sample regimes.
Analysis leverages multi-stage policy structure and dynamic programming.
Abstract
Imitation learning learns a policy from expert trajectories. While the expert data is believed to be crucial for imitation quality, it was found that a kind of imitation learning approach, adversarial imitation learning (AIL), can have exceptional performance. With as little as only one expert trajectory, AIL can match the expert performance even in a long horizon, on tasks such as locomotion control. There are two mysterious points in this phenomenon. First, why can AIL perform well with only a few expert trajectories? Second, why does AIL maintain good performance despite the length of the planning horizon? In this paper, we theoretically explore these two questions. For a total-variation-distance-based AIL (called TV-AIL), our analysis shows a horizon-free imitation gap on a class of instances abstracted from locomotion control tasks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
