A Critique of Strictly Batch Imitation Learning
Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

TL;DR
This paper critically examines the limitations of strictly batch imitation learning, highlighting issues with current energy-based models and demonstrating potential inconsistencies compared to behavioral cloning.
Contribution
It provides a critique of recent energy-based approaches in offline imitation learning, emphasizing the disconnect from true state distributions and potential estimation inconsistencies.
Findings
Energy-based models may not accurately capture true state visitation distributions.
Parameter coupling can lead to inconsistent policy estimates.
Behavioral cloning remains a more reliable baseline in certain scenarios.
Abstract
Recent work by Jarrett et al. attempts to frame the problem of offline imitation learning (IL) as one of learning a joint energy-based model, with the hope of out-performing standard behavioral cloning. We suggest that notational issues obscure how the psuedo-state visitation distribution the authors propose to optimize might be disconnected from the policy's state visitation distribution. We further construct natural examples where the parameter coupling advocated by Jarrett et al. leads to inconsistent estimates of the expert's policy, unlike behavioral cloning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition
