Action Hallucination in Generative Vision-Language-Action Models

Harold Soh; Eugene Lim

arXiv:2602.06339·cs.RO·May 13, 2026

Action Hallucination in Generative Vision-Language-Action Models

Harold Soh, Eugene Lim

PDF

TL;DR

This paper analyzes action hallucinations in generative vision-language-action models for robots, identifying structural causes and proposing directions to improve reliability without sacrificing expressiveness.

Contribution

It uncovers structural causes of hallucinations in latent-variable policies and offers mechanistic explanations for empirical failures, guiding future improvements.

Findings

01

Hallucinations arise from topological, precision, and horizon barriers.

02

Structural mismatches cause violations of physical constraints.

03

Analysis suggests directions to enhance reliability of generative robot policies.

Abstract

Robot Foundation Models, such as VLAs, promise end-to-end generative robot policies with broad generalization. Yet it remains unclear whether they fundamentally resolve the core problem of action generation in embodied settings, or overcome the long-standing challenges of robotics. We address this question by analyzing action hallucinations that violate physical constraints and their extension to plan-level failures. Focusing on latent-variable generative policies, we show that hallucinations can arise from structural mismatches between feasible robot behavior and common model architectures. We study three such barriers -- topological, precision, and horizon -- and show how they impose unavoidable tradeoffs. Our analysis provides mechanistic explanations for reported empirical failures of generative robot policies and suggests principled directions for improving reliability and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.