On Efficient Online Imitation Learning via Classification
Yichen Li, Chicheng Zhang

TL;DR
This paper investigates the limits and possibilities of classification-based online imitation learning, proposing new algorithms that improve sample efficiency and analyzing fundamental computational barriers in the nonrealizable setting.
Contribution
It introduces the Logger framework for improper online learning in COIL, designs two oracle-efficient algorithms, and establishes theoretical limitations on dynamic regret minimization.
Findings
Proper online algorithms cannot guarantee sublinear regret in general.
The Logger framework reduces COIL to online linear optimization.
Proposed algorithms outperform naive behavior cloning in finite-sample settings.
Abstract
Imitation learning (IL) is a general learning paradigm for tackling sequential decision-making problems. Interactive imitation learning, where learners can interactively query for expert demonstrations, has been shown to achieve provably superior sample efficiency guarantees compared with its offline counterpart or reinforcement learning. In this work, we study classification-based online imitation learning (abbrev. ) and the fundamental feasibility to design oracle-efficient regret-minimization algorithms in this setting, with a focus on the general nonrealizable case. We make the following contributions: (1) we show that in the problem, any proper online learning algorithm cannot guarantee a sublinear regret in general; (2) we propose , an improper online learning algorithmic framework, that reduces to online linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
