GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data

Shengliang Deng; Mi Yan; Songlin Wei; Haixin Ma; Yuxin Yang; Jiayi Chen; Zhiqi Zhang; Taoyu Yang; Xuheng Zhang; Wenhao Zhang; Heming Cui; Zhizheng Zhang; He Wang

arXiv:2505.03233·cs.RO·August 28, 2025

GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data

Shengliang Deng, Mi Yan, Songlin Wei, Haixin Ma, Yuxin Yang, Jiayi Chen, Zhiqi Zhang, Taoyu Yang, Xuheng Zhang, Wenhao Zhang, Heming Cui, Zhizheng Zhang, He Wang

PDF

Open Access

TL;DR

GraspVLA is a foundation model trained on a billion-scale synthetic dataset for robotic grasping, demonstrating strong zero-shot and few-shot generalization capabilities across real and simulated environments.

Contribution

This work introduces a novel synthetic dataset and a unified VLA model pre-trained on it, enabling effective grasping with reduced reliance on real-world data.

Findings

01

Achieves state-of-the-art zero-shot grasping performance

02

Demonstrates effective transfer to real-world grasping tasks

03

Shows strong few-shot adaptability to human preferences

Abstract

Embodied foundation models are gaining increasing attention for their zero-shot generalization, scalability, and adaptability to new tasks through few-shot post-training. However, existing models rely heavily on real-world data, which is costly and labor-intensive to collect. Synthetic data offers a cost-effective alternative, yet its potential remains largely underexplored. To bridge this gap, we explore the feasibility of training Vision-Language-Action models entirely with large-scale synthetic action data. We curate SynGrasp-1B, a billion-frame robotic grasping dataset generated in simulation with photorealistic rendering and extensive domain randomization. Building on this, we present GraspVLA, a VLA model pretrained on large-scale synthetic action data as a foundational model for grasping tasks. GraspVLA integrates autoregressive perception tasks and flow-matching-based action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStroke Rehabilitation and Recovery · Human Pose and Action Recognition