StageCraft: Execution Aware Mitigation of Distractor and Obstruction Failures in VLA Models

Kartikay Milind Pangaonkar; Prabin Rath; Omkar Patil; Nakul Gopalan

arXiv:2603.20659·cs.RO·March 24, 2026

StageCraft: Execution Aware Mitigation of Distractor and Obstruction Failures in VLA Models

Kartikay Milind Pangaonkar, Prabin Rath, Omkar Patil, Nakul Gopalan

PDF

Open Access

TL;DR

StageCraft is a training-free, VLM-based method that improves vision-language action models by manipulating initial environments to mitigate failures caused by distractors and obstructions, significantly enhancing real-world task performance.

Contribution

We introduce StageCraft, a novel environment manipulation approach leveraging large vision-language models to improve policy robustness without additional training.

Findings

01

40% performance improvement in real-world tasks

02

Effective in diverse distractor and obstruction scenarios

03

Adapts intervention based on policy strength

Abstract

Large scale pre-training on text and image data along with diverse robot demonstrations has helped Vision Language Action models (VLAs) to generalize to novel tasks, objects and scenes. However, these models are still susceptible to failure in the presence of execution-time impediments such as distractors and physical obstructions in the robot's workspace. Existing policy improvement methods finetune base VLAs to improve generalization, yet they still struggle in unseen distractor settings. To address this problem, we investigate whether internet-scale pretraining of large vision-language models (VLMs) can be leveraged to reason about these impediments and mitigate policy failures. To this end, we propose StageCraft, a training-free approach to improve pretrained VLA policy performance by manipulating the environment's initial state using VLM-based in-context reasoning. StageCraft takes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)