VacuumVLA: Boosting VLA Capabilities via a Unified Suction and Gripping Tool for Complex Robotic Manipulation

Hui Zhou; Siyuan Huang; Minxing Li; Hao Zhang; Lue Fan; Shaoshuai Shi

arXiv:2511.21557·cs.RO·November 27, 2025

VacuumVLA: Boosting VLA Capabilities via a Unified Suction and Gripping Tool for Complex Robotic Manipulation

Hui Zhou, Siyuan Huang, Minxing Li, Hao Zhang, Lue Fan, Shaoshuai Shi

PDF

Open Access

TL;DR

VacuumVLA introduces a hybrid robotic end effector combining gripping and suction to enhance manipulation capabilities, enabling robots to perform complex tasks previously unfeasible with standard grippers.

Contribution

The paper presents a novel integrated hardware design that combines a mechanical gripper with a vacuum suction unit, expanding the task range of vision language action systems.

Findings

01

Successful execution of complex tasks like wiping and drawer opening

02

Enhanced task versatility with hybrid end effector

03

Validation within state-of-the-art VLA frameworks

Abstract

Vision Language Action models have significantly advanced general purpose robotic manipulation by harnessing large scale pretrained vision and language representations. Among existing approaches, a majority of current VLA systems employ parallel two finger grippers as their default end effectors. However, such grippers face inherent limitations in handling certain real world tasks such as wiping glass surfaces or opening drawers without handles due to insufficient contact area or lack of adhesion. To overcome these challenges, we present a low cost, integrated hardware design that combines a mechanical two finger gripper with a vacuum suction unit, enabling dual mode manipulation within a single end effector. Our system supports flexible switching or synergistic use of both modalities, expanding the range of feasible tasks. We validate the efficiency and practicality of our design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Soft Robotics and Applications · Multimodal Machine Learning Applications