Minimax Optimal Online Imitation Learning via Replay Estimation
Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J., Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

TL;DR
This paper introduces replay estimation to reduce variance in online imitation learning, achieving near-optimal performance bounds and improving policy outcomes in continuous control tasks.
Contribution
The paper proposes replay estimation, a novel variance reduction technique, and provides theoretical guarantees showing near-optimal performance in finite sample imitation learning.
Findings
Replay estimation reduces empirical variance in imitation learning.
Theoretical bounds show near-optimal performance with weaker assumptions.
Empirical results demonstrate significant policy improvements across tasks.
Abstract
Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap that scales with for behavioral cloning and for online moment matching, where is the horizon and is the size of the expert dataset. We introduce the technique of replay estimation to reduce this empirical variance: by repeatedly executing cached expert actions in a stochastic simulator, we compute a smoother expert visitation distribution estimate to match. In the presence of general function approximation, we prove a meta theorem reducing the performance gap of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
