Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts

Zhen Sun; Yongjian Guo; Haoran Sun; Luqiao Wang; Wei Lu; Jiachi Ji; Shengzhe Ji; Junwu Xiong; Zhijun Meng

arXiv:2605.22446·cs.CV·May 22, 2026

Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts

Zhen Sun, Yongjian Guo, Haoran Sun, Luqiao Wang, Wei Lu, Jiachi Ji, Shengzhe Ji, Junwu Xiong, Zhijun Meng

PDF

TL;DR

Pre-VLA introduces a preemptive verification system for vision-language-action models to improve safety and efficiency during real-world deployment by filtering low-quality actions before execution.

Contribution

It proposes a unified runtime verification architecture with a multimodal backbone and a dual-mode scheduler to enhance action validity assessment in embodied AI.

Findings

01

Improves success rate from 30.79% to 37.62% on LIBERO benchmark.

02

Reduces task execution steps and mitigates error accumulation.

03

Achieves 183.9 ms average verification time per action.

Abstract

While large vision-language-action (VLA) models and generative world models (WM) have advanced long-horizon embodied intelligence, their practical deployment remains challenged by uncertainty in learning-based action generation. Low-quality actions may cause physical failures during execution or lead to misleading world-model rollouts with redundant rendering costs. To address this issue, we propose Pre-VLA, a unified runtime verification architecture that performs preemptive action validity assessment before physical execution or world-model imagination. Pre-VLA leverages an efficient multimodal backbone with modality-aware pooling and a lightweight dual-branch head to predict both safety confidence and critic-derived advantage scores for candidate action chunks. To handle severe class imbalance and unstable boundary decisions, we train Pre-VLA with a multi-task objective combining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.