A Strong Baseline for Batch Imitation Learning
Matthew Smith, Lucas Maystre, Zhenwen Dai, Kamil Ciosek

TL;DR
This paper introduces a simple, hyper-parameter-free batch imitation learning algorithm with formal guarantees, a robust evaluation protocol, and competitive performance on continuous control benchmarks, suitable for safety-critical applications.
Contribution
It presents a novel, easy-to-implement imitation learning algorithm with theoretical guarantees and a new evaluation protocol for offline RL.
Findings
Algorithm achieves competitive results on continuous control tasks.
Provides formal sample complexity guarantees for the proposed method.
Establishes a fair evaluation protocol for offline reinforcement learning.
Abstract
Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making. We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm, in which the agent must learn solely from data collected a priori. This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern. Our algorithm requires no additional hyper-parameter tuning beyond any standard batch reinforcement learning (RL) algorithm, making it an ideal baseline for such data-strict regimes. Furthermore, we provide formal sample complexity guarantees for the algorithm in finite Markov Decision Problems. In doing so, we formally demonstrate an unproven claim from Kearns & Singh (1998). On the empirical side, our contribution is twofold. First, we develop a practical, robust and principled evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
