CoFreeVLA: Collision-Free Dual-Arm Manipulation via Vision-Language-Action Model and Risk Estimation
Xuanran Zhai, Binkai Ou, Qiaojun Yu, Ce Hao, Yaohua Liu

TL;DR
CoFreeVLA enhances dual-arm robotic manipulation by integrating a self-collision risk estimator with vision-language models, significantly reducing collisions and increasing task success rates in complex bimanual tasks.
Contribution
It introduces a novel risk estimation module that predicts self-collision likelihood and guides safe manipulation, improving safety and performance over existing methods.
Findings
Reduces self-collisions in dual-arm tasks
Improves success rates on bimanual manipulation
Effective risk-guided adjustments enhance safety
Abstract
Vision Language Action (VLA) models enable instruction following manipulation, yet dualarm deployment remains unsafe due to under modeled selfcollisions between arms and grasped objects. We introduce CoFreeVLA, which augments an endtoend VLA with a short horizon selfcollision risk estimator that predicts collision likelihood from proprioception, visual embeddings, and planned actions. The estimator gates risky commands, recovers to safe states via risk-guided adjustments, and shapes policy refinement for safer rollouts. It is pre-trained with model-based collision labels and posttrained on real robot rollouts for calibration. On five bimanual tasks with the PiPER robot arm, CoFreeVLA reduces selfcollisions and improves success rates versus RDT and APEX.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Social Robot Interaction and HRI
