VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation
Zijian An, Hadi Khezam, Bill Cai, Ran Yang, Shijie Geng, Yiming Feng, Yue (Luna) Zheng, Lifeng Zhou

TL;DR
VILAS is a low-cost, modular robotic platform supporting vision-language-action policy learning, featuring a soft gripper for fragile object manipulation, and validated through grape grasping experiments.
Contribution
The paper introduces VILAS, a cost-effective robotic system with a soft gripper and unified communication, enabling end-to-end VLA policy deployment on accessible hardware.
Findings
State-of-the-art VLA models successfully deployed on VILAS.
Soft gripper enables safe manipulation of fragile objects.
Experimental validation on grape grasping task confirms system effectiveness.
Abstract
We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi_0, pi_0.5, and GR00T N1.6. All models are fine-tuned from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
