Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning
Yichen Li, Chicheng Zhang

TL;DR
This paper demonstrates that interactive and hybrid imitation learning algorithms, which utilize state-wise annotations and combine offline and interactive data, can provably outperform traditional behavior cloning under certain cost conditions, with empirical validation on control tasks.
Contribution
The paper introduces the first theoretical and empirical analysis showing that state-wise interactive annotations and hybrid learning methods can surpass behavior cloning in imitation learning.
Findings
Stagger outperforms BC under low recovery cost settings.
Warm Stagger effectively combines offline and interactive data with strong guarantees.
Interactive and hybrid methods outperform BC in MuJoCo tasks with modest annotation cost ratios.
Abstract
Imitation learning (IL) is a paradigm for learning sequential decision making policies from experts, leveraging offline demonstrations, interactive annotations, or both. Recent advances show that when annotation cost is tallied per trajectory, Behavior Cloning (BC) which relies solely on offline demonstrations cannot be improved in general, leaving limited conditions for interactive methods such as DAgger to help. We revisit this conclusion and prove that when the annotation cost is measured per state, algorithms using interactive annotations can provably outperform BC. Specifically: (1) we show that Stagger, a one sample per round variant of DAgger, provably beats BC under low recovery cost settings; (2) we initiate the study of hybrid IL where the agent learns from offline demonstrations and interactive annotations. We propose Warm Stagger whose learning guarantee is not much worse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications
MethodsFocus
