A Critique of Strictly Batch Imitation Learning

Gokul Swamy; Sanjiban Choudhury; J. Andrew Bagnell; Zhiwei Steven Wu

arXiv:2110.02063·cs.LG·October 6, 2021·1 cites

A Critique of Strictly Batch Imitation Learning

Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

PDF

Open Access

TL;DR

This paper critically examines the limitations of strictly batch imitation learning, highlighting issues with current energy-based models and demonstrating potential inconsistencies compared to behavioral cloning.

Contribution

It provides a critique of recent energy-based approaches in offline imitation learning, emphasizing the disconnect from true state distributions and potential estimation inconsistencies.

Findings

01

Energy-based models may not accurately capture true state visitation distributions.

02

Parameter coupling can lead to inconsistent policy estimates.

03

Behavioral cloning remains a more reliable baseline in certain scenarios.

Abstract

Recent work by Jarrett et al. attempts to frame the problem of offline imitation learning (IL) as one of learning a joint energy-based model, with the hope of out-performing standard behavioral cloning. We suggest that notational issues obscure how the psuedo-state visitation distribution the authors propose to optimize might be disconnected from the policy's $true$ state visitation distribution. We further construct natural examples where the parameter coupling advocated by Jarrett et al. leads to inconsistent estimates of the expert's policy, unlike behavioral cloning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition