LangGap: Diagnosing and Closing the Language Gap in Vision-Language-Action Models
Yuchen Hou, Lin Zhao

TL;DR
This paper introduces LangGap, a benchmark designed to diagnose and improve language understanding in vision-language-action models by using semantic perturbations and diverse tasks, revealing significant gaps in current models.
Contribution
The paper presents LangGap, a novel benchmark with semantic perturbations and diverse tasks to systematically evaluate and diagnose language understanding deficits in VLA models.
Findings
Targeted data augmentation improves success rate from 0% to 90%.
Multi-task training increases success rate from 0% to 28%.
Models struggle with increased semantic diversity, revealing fundamental limitations.
Abstract
Vision-Language-Action (VLA) models achieve over 95% success on standard benchmarks. However, through systematic experiments, we find that current state-of-the-art VLA models largely ignore language instructions. Prior work lacks: (1) systematic semantic perturbation diagnostics, (2) a benchmark that forces language understanding by design, and (3) linguistically diverse training data. This paper constructs the LangGap benchmark, based on a four-dimensional semantic perturbation method -- varying instruction semantics while keeping the tabletop layout fixed -- revealing language understanding deficits in {\pi}0.5. Existing benchmarks like LIBERO assign only one task per layout, underutilizing available objects and target locations; LangGap fully diversifies pick-and-place tasks under identical layouts, forcing models to truly understand language. Experiments show that targeted data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Advanced Neural Network Applications
