Restoring Linguistic Grounding in VLA Models via Train-Free Attention Recalibration

Ninghao Zhang; Bin Zhu; Shijie Zhou; Jingjing Chen

arXiv:2603.06001·cs.RO·March 9, 2026

Restoring Linguistic Grounding in VLA Models via Train-Free Attention Recalibration

Ninghao Zhang, Bin Zhu, Shijie Zhou, Jingjing Chen

PDF

Open Access

TL;DR

This paper identifies a bias in vision-language-action models where they ignore contradictory instructions, introduces a diagnostic benchmark to evaluate this issue, and proposes a train-free attention recalibration method to improve instruction adherence in robotic tasks.

Contribution

The paper reveals a critical failure mode in VLA models called linguistic blindness, introduces ICBench for systematic evaluation, and proposes IGAR, a train-free method to mitigate instruction ignoring.

Findings

01

VLA models often execute actions despite contradictory instructions.

02

IGAR significantly reduces errors caused by OOD instruction contradictions.

03

The approach improves real robot manipulation safety and reliability.

Abstract

Vision-Language-Action (VLA) models enable robots to perform manipulation tasks directly from natural language instructions and are increasingly viewed as a foundation for generalist robotic policies. However, their reliability under Out-of-Distribution (OOD) instructions remains underexplored. In this paper, we reveal a critical failure mode in which VLA policies continue executing visually plausible actions even when the language instruction contradicts the scene. We refer to this phenomenon as linguistic blindness, where VLA policies prioritize visual priors over instruction semantics during action generation. To systematically analyze this issue, we introduce ICBench, a diagnostic benchmark constructed from the LIBERO dataset that probes language-action coupling by injecting controlled OOD instruction contradictions while keeping the visual environment unchanged. Evaluations on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Advanced Neural Network Applications