Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training
Suning Huang, Jiaqi Shao, Ke Wang, Qianzhong Chen, Jiankai Sun, Yanjiang Guo, Mac Schwager, Jeannette Bohg

TL;DR
DeLock is a method that preserves the generalization ability of vision-language-action policies after low-data post-training by maintaining visual grounding and guiding behavior with contrastive prompts.
Contribution
The paper introduces DeLock, a novel approach that mitigates lock-in during post-training without additional supervision, outperforming existing methods.
Findings
DeLock outperforms strong baselines across eight evaluations.
DeLock matches or exceeds performance of policies trained with more data.
DeLock effectively preserves visual grounding during post-training.
Abstract
Have you ever post-trained a generalist vision-language-action (VLA) policy on a small demonstration dataset, only to find that it stops responding to new instructions and is limited to behaviors observed during post-training? We identify this phenomenon as lock-in: after low-data, supervised fine-tuning (SFT), the policy becomes overly specialized to the post-training data and fails to generalize to novel instructions, manifesting as concept lock-in (fixation on training objects/attributes) and spatial lock-in (fixation on training spatial targets). Many existing remedies introduce additional supervision signals, such as those derived from foundation models or auxiliary objectives, or rely on augmented datasets to recover generalization. In this paper, we show that the policy's internal pre-trained knowledge is sufficient: DeLock mitigates lock-in by preserving visual grounding during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
