VLA-REPLICA: A Low-Cost, Reproducible Benchmark for Real-World Evaluation of Vision-Language-Action Models
Alex S. Huang, Jiahui Zhang, Shiqing Tang, Yu Xiang

TL;DR
VLA-REPLICA is a low-cost, reproducible real-world benchmark system for evaluating vision-language-action models in robotic manipulation, addressing the limitations of existing benchmarks.
Contribution
We introduce VLA-REPLICA, a versatile, accessible benchmark built from off-the-shelf components for consistent evaluation of VLA models across labs.
Findings
Experiments reveal strengths and limitations of current VLA models.
The benchmark demonstrates high reproducibility across different setups.
Includes diverse manipulation tasks and a target-domain adaptation dataset.
Abstract
Vision-Language-Action (VLA) models have shown strong promise for general-purpose robotic manipulation, but their real-world evaluation remains limited by a lack of accessible, reproducible, and consistent benchmarks. Simulation benchmarks fail to capture real-world complexity, while existing real-world benchmarks often require expensive hardware, centralized evaluation, or are limited in task diversity. We introduce VLA-REPLICA, a low-cost, easily reproducible real-world benchmark for evaluating VLA models. Built from off-the-shelf components, our system can be quickly assembled and replicated across laboratories, providing a consistent environment for policy evaluation anywhere in the world. VLA-REPLICA includes a diverse suite of manipulation tasks and a small-scale demonstration dataset for target-domain adaptation, with real-world evaluation protocols for both in-distribution and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
