LIBERO-X: Robustness Litmus for Vision-Language-Action Models

Guodong Wang; Chenkai Zhang; Qingjie Liu; Jinjin Zhang; Jiancheng Cai; Junjie Liu; Xinmin Liu

arXiv:2602.06556·cs.CV·February 9, 2026

LIBERO-X: Robustness Litmus for Vision-Language-Action Models

Guodong Wang, Chenkai Zhang, Qingjie Liu, Jinjin Zhang, Jiancheng Cai, Junjie Liu, Xinmin Liu

PDF

Open Access 1 Datasets

TL;DR

LIBERO-X introduces a hierarchical benchmark with diverse training data to better evaluate the robustness and generalization of vision-language-action models under real-world complexities.

Contribution

It proposes a comprehensive benchmarking framework combining progressive evaluation protocols and diverse datasets to improve assessment of VLA models.

Findings

01

Models show significant performance drops under complex perturbations.

02

Hierarchical evaluation reveals specific weaknesses in scene understanding.

03

Diverse training data helps bridge the gap between training and real-world scenarios.

Abstract

Reliable benchmarking is critical for advancing Vision-Language-Action (VLA) models, as it reveals their generalization, robustness, and alignment of perception with language-driven manipulation tasks. However, existing benchmarks often provide limited or misleading assessments due to insufficient evaluation protocols that inadequately capture real-world distribution shifts. This work systematically rethinks VLA benchmarking from both evaluation and data perspectives, introducing LIBERO-X, a benchmark featuring: 1) A hierarchical evaluation protocol with progressive difficulty levels targeting three core capabilities: spatial generalization, object recognition, and task instruction understanding. This design enables fine-grained analysis of performance degradation under increasing environmental and task complexity; 2) A high-diversity training dataset collected via human teleoperation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

meituan/LIBERO-X
dataset· 1.1k dl
1.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning