A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search

Arnav Kumar Jain; Vibhakar Mohta; Subin Kim; Atiksh Bhardwaj; Juntao Ren; Yunhai Feng; Sanjiban Choudhury; Gokul Swamy

arXiv:2506.05294·cs.LG·October 27, 2025

A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search

Arnav Kumar Jain, Vibhakar Mohta, Subin Kim, Atiksh Bhardwaj, Juntao Ren, Yunhai Feng, Sanjiban Choudhury, Gokul Swamy

PDF

1 Repo 1 Models 2 Datasets 1 Video

TL;DR

This paper introduces SAILOR, a learning to search approach for imitation learning that improves recovery from mistakes and outperforms behavioral cloning on visual manipulation tasks.

Contribution

It proposes a novel learning to search framework with a world model and reward model, enhancing robustness and recovery in imitation learning.

Findings

01

SAILOR outperforms state-of-the-art diffusion policies on multiple benchmarks.

02

Scaling demonstrations for behavioral cloning does not close the performance gap.

03

SAILOR effectively identifies failures and resists reward hacking.

Abstract

The fundamental limitation of the behavioral cloning (BC) approach to imitation learning is that it only teaches an agent what the expert did at states the expert visited. This means that when a BC agent makes a mistake which takes them out of the support of the demonstrations, they often don't know how to recover from it. In this sense, BC is akin to giving the agent the fish -- giving them dense supervision across a narrow set of states -- rather than teaching them to fish: to be able to reason independently about achieving the expert's outcome even when faced with unseen situations at test-time. In response, we explore learning to search (L2S) from expert demonstrations, i.e. learning the components required to, at test time, plan to match expert outcomes, even after making a mistake. These include (1) a world model and (2) a reward model. We carefully ablate the set of algorithmic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arnavkj1995/sailor
pytorchOfficial

Models

🤗
vib2810/sailor_ckpts
model

Datasets

Videos

A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search· slideslive

Taxonomy

MethodsDiffusion · Sparse Evolutionary Training