Minimax Optimal Online Imitation Learning via Replay Estimation

Gokul Swamy; Nived Rajaraman; Matthew Peng; Sanjiban Choudhury; J.; Andrew Bagnell; Zhiwei Steven Wu; Jiantao Jiao; Kannan Ramchandran

arXiv:2205.15397·cs.LG·January 18, 2023·1 cites

Minimax Optimal Online Imitation Learning via Replay Estimation

Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J., Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces replay estimation to reduce variance in online imitation learning, achieving near-optimal performance bounds and improving policy outcomes in continuous control tasks.

Contribution

The paper proposes replay estimation, a novel variance reduction technique, and provides theoretical guarantees showing near-optimal performance in finite sample imitation learning.

Findings

01

Replay estimation reduces empirical variance in imitation learning.

02

Theoretical bounds show near-optimal performance with weaker assumptions.

03

Empirical results demonstrate significant policy improvements across tasks.

Abstract

Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap that scales with $H^{2} / N$ for behavioral cloning and $H / N$ for online moment matching, where $H$ is the horizon and $N$ is the size of the expert dataset. We introduce the technique of replay estimation to reduce this empirical variance: by repeatedly executing cached expert actions in a stochastic simulator, we compute a smoother expert visitation distribution estimate to match. In the presence of general function approximation, we prove a meta theorem reducing the performance gap of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gkswamy98/replay_est
pytorchOfficial

Videos

Minimax Optimal Online Imitation Learning via Replay Estimation· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms