TL;DR
ManiSoft introduces a comprehensive benchmark and simulation environment for vision-language manipulation tasks involving soft robotic arms, addressing challenges like deformability and contact-rich interactions.
Contribution
The paper presents ManiSoft, a novel benchmark with a simulator, tasks, and an automated data pipeline for evaluating soft arm manipulation policies.
Findings
Promising results in clean scenes with current policies
Significant performance drop under randomization
Failures mainly due to visual estimation and limited deformability exploitation
Abstract
Most existing vision-language manipulation research targets rigid robotic arms, whose fixed morphology limits adaptability in cluttered or confined spaces. Soft robotic arms offer an appealing alternative due to their deformability, but confront challenges such as unreliable proprioception and distributed low-level actuation. To investigate these challenges, we introduce \ManiSoft, a benchmark for vision-language manipulation with soft arms. ManiSoft features a tailored simulator that couples realistic soft-body dynamics with contact-rich interactions via an elastic force constraint. On this basis, ManiSoft defines four tasks, each highlighting distinct aspects of deformable control, from basic end-effector coordination to obstacle avoidance. To support policy training and evaluation, \ManiSoft{} includes an automated pipeline that generates diverse scenes and corresponding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
