Your Vision-Language-Action Model Already Has Attention Heads For Path Deviation Detection

Jaehwan Jeong; Evelyn Zhu; Jinying Lin; Emmanuel Jaimes; Tuan-Anh Vu; Jungseock Joo; Sangpil Kim; M. Khalid Jawed

arXiv:2603.13782·cs.RO·March 17, 2026

Your Vision-Language-Action Model Already Has Attention Heads For Path Deviation Detection

Jaehwan Jeong, Evelyn Zhu, Jinying Lin, Emmanuel Jaimes, Tuan-Anh Vu, Jungseock Joo, Sangpil Kim, M. Khalid Jawed

PDF

Open Access

TL;DR

This paper reveals that specific attention heads within a frozen vision-language-action model can be used to detect navigation path deviations in real time, enabling a training-free anomaly detection and recovery system for robots.

Contribution

It introduces a novel, training-free method to detect navigation hallucinations by monitoring a few attention heads, improving robustness without additional training or computational overhead.

Findings

01

A combination of three attention heads detects 44.6% of deviations

02

Detection has a false-positive rate of 11.7%

03

The system is successfully deployed on a physical robot

Abstract

Vision-Language-Action (VLA) models have demonstrated strong potential for predicting semantic actions in navigation tasks, demonstrating the ability to reason over complex linguistic instructions and visual contexts. However, they are fundamentally hindered by visual-reasoning hallucinations that lead to trajectory deviations. Addressing this issue has conventionally required training external critic modules or relying on complex uncertainty heuristics. In this work, we discover that monitoring a few attention heads within a frozen VLA model can accurately detect path deviations without incurring additional computational overhead. We refer to these heads, which inherently capture the spatiotemporal causality between historical visual sequences and linguistic instructions, as Navigation Heads. Using these heads, we propose an intuitive, training-free anomaly-detection framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning