A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search
Arnav Kumar Jain, Vibhakar Mohta, Subin Kim, Atiksh Bhardwaj, Juntao Ren, Yunhai Feng, Sanjiban Choudhury, Gokul Swamy

TL;DR
This paper introduces SAILOR, a learning to search approach for imitation learning that improves recovery from mistakes and outperforms behavioral cloning on visual manipulation tasks.
Contribution
It proposes a novel learning to search framework with a world model and reward model, enhancing robustness and recovery in imitation learning.
Findings
SAILOR outperforms state-of-the-art diffusion policies on multiple benchmarks.
Scaling demonstrations for behavioral cloning does not close the performance gap.
SAILOR effectively identifies failures and resists reward hacking.
Abstract
The fundamental limitation of the behavioral cloning (BC) approach to imitation learning is that it only teaches an agent what the expert did at states the expert visited. This means that when a BC agent makes a mistake which takes them out of the support of the demonstrations, they often don't know how to recover from it. In this sense, BC is akin to giving the agent the fish -- giving them dense supervision across a narrow set of states -- rather than teaching them to fish: to be able to reason independently about achieving the expert's outcome even when faced with unseen situations at test-time. In response, we explore learning to search (L2S) from expert demonstrations, i.e. learning the components required to, at test time, plan to match expert outcomes, even after making a mistake. These include (1) a world model and (2) a reward model. We carefully ablate the set of algorithmic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsDiffusion · Sparse Evolutionary Training
