Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

Surbhi Goel; Jonathan Pei; James Wang

arXiv:2605.09183·cs.LG·May 19, 2026

Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

Surbhi Goel, Jonathan Pei, James Wang

PDF

TL;DR

This paper introduces a selective imitation learning framework that enables a learner to stop when it cannot reliably imitate an expert under arbitrary dynamics shifts, ensuring low regret and robustness.

Contribution

The authors propose SeqRejectron, an algorithm for selective imitation that constructs stopping rules with horizon-free guarantees, handling deterministic and stochastic policies under dynamics shifts.

Findings

01

Horizon-free sample complexity for deterministic policies: $ ilde{O}(rac{ ext{log}| ext{Pi}|}{ ext{epsilon}^2})$

02

Extension to stochastic policies with cumulative Hellinger stopping time

03

Framework degrades gracefully with expert misspecification

Abstract

Behavior cloning provides strong imitation learning guarantees when training and test environments share the same dynamics. However, in many deployment settings the test environment's transitions differ from training, and classical offline IL offers no recourse: the learner must commit to an action at every state, even when its demonstrations are uninformative and could lead to arbitrary degradation of performance. This motivates the study of selective imitation, where the learner may choose to stop when it cannot act reliably. We introduce a model for selective imitation under arbitrary dynamics shift: given labeled expert demonstrations from a training environment and unlabeled state trajectories from the same expert in a test environment, the learner outputs a selective policy that is complete (rarely stops in training) and sound (incurs low regret before stopping in test). Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.