VacuumVLA: Boosting VLA Capabilities via a Unified Suction and Gripping Tool for Complex Robotic Manipulation
Hui Zhou, Siyuan Huang, Minxing Li, Hao Zhang, Lue Fan, Shaoshuai Shi

TL;DR
VacuumVLA introduces a hybrid robotic end effector combining gripping and suction to enhance manipulation capabilities, enabling robots to perform complex tasks previously unfeasible with standard grippers.
Contribution
The paper presents a novel integrated hardware design that combines a mechanical gripper with a vacuum suction unit, expanding the task range of vision language action systems.
Findings
Successful execution of complex tasks like wiping and drawer opening
Enhanced task versatility with hybrid end effector
Validation within state-of-the-art VLA frameworks
Abstract
Vision Language Action models have significantly advanced general purpose robotic manipulation by harnessing large scale pretrained vision and language representations. Among existing approaches, a majority of current VLA systems employ parallel two finger grippers as their default end effectors. However, such grippers face inherent limitations in handling certain real world tasks such as wiping glass surfaces or opening drawers without handles due to insufficient contact area or lack of adhesion. To overcome these challenges, we present a low cost, integrated hardware design that combines a mechanical two finger gripper with a vacuum suction unit, enabling dual mode manipulation within a single end effector. Our system supports flexible switching or synergistic use of both modalities, expanding the range of feasible tasks. We validate the efficiency and practicality of our design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Soft Robotics and Applications · Multimodal Machine Learning Applications
