Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots

Haochen Su; Cristian Meo; Francesco Stella; Andrea Peirone; Kai Junge; Josie Hughes

arXiv:2510.17369·cs.RO·October 21, 2025

Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots

Haochen Su, Cristian Meo, Francesco Stella, Andrea Peirone, Kai Junge, Josie Hughes

PDF

Open Access

TL;DR

This paper demonstrates that with targeted finetuning, vision-language-action models can be effectively deployed on soft robots, enabling safe and adaptable human-robot interactions in unstructured environments.

Contribution

It introduces a deployment pipeline for VLA models on soft robots and shows finetuning bridges embodiment gaps for safe, flexible control.

Findings

01

Finetuning enables soft robots to match rigid robot performance.

02

Out-of-the-box VLA policies fail on soft robots due to embodiment mismatch.

03

Coupling VLA models with soft robots allows safe human-robot interaction.

Abstract

Robotic systems are increasingly expected to operate in human-centered, unstructured environments where safety, adaptability, and generalization are essential. Vision-Language-Action (VLA) models have been proposed as a language guided generalized control framework for real robots. However, their deployment has been limited to conventional serial link manipulators. Coupled by their rigidity and unpredictability of learning based control, the ability to safely interact with the environment is missing yet critical. In this work, we present the deployment of a VLA model on a soft continuum manipulator to demonstrate autonomous safe human-robot interaction. We present a structured finetuning and deployment pipeline evaluating two state-of-the-art VLA models (OpenVLA-OFT and $π_{0}$ ) across representative manipulation tasks, and show while out-of-the-box policies fail due to embodiment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Soft Robotics and Applications · Multimodal Machine Learning Applications