VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

Zijian An; Hadi Khezam; Bill Cai; Ran Yang; Shijie Geng; Yiming Feng; Yue (Luna) Zheng; Lifeng Zhou

arXiv:2605.02037·cs.RO·May 5, 2026

VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

Zijian An, Hadi Khezam, Bill Cai, Ran Yang, Shijie Geng, Yiming Feng, Yue (Luna) Zheng, Lifeng Zhou

PDF

TL;DR

VILAS is a low-cost, modular robotic platform supporting vision-language-action policy learning, featuring a soft gripper for fragile object manipulation, and validated through grape grasping experiments.

Contribution

The paper introduces VILAS, a cost-effective robotic system with a soft gripper and unified communication, enabling end-to-end VLA policy deployment on accessible hardware.

Findings

01

State-of-the-art VLA models successfully deployed on VILAS.

02

Soft gripper enables safe manipulation of fragile objects.

03

Experimental validation on grape grasping task confirms system effectiveness.

Abstract

We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi_0, pi_0.5, and GR00T N1.6. All models are fine-tuned from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.