Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic   Dexterous Manipulations

Koffivi Fid\`ele Gbagbe; Miguel Altamirano Cabrera; Ali Alabbas,; Oussama Alyunes; Artem Lykov; and Dzmitry Tsetserukou

arXiv:2405.06039·cs.RO·August 20, 2024

Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations

Koffivi Fid\`ele Gbagbe, Miguel Altamirano Cabrera, Ali Alabbas,, Oussama Alyunes, Artem Lykov, and Dzmitry Tsetserukou

PDF

Open Access

TL;DR

This paper presents Bi-VLA, a comprehensive vision-language-action system enabling bimanual robots to understand complex instructions, perceive visual scenes, and perform dexterous household tasks with high accuracy and adaptability.

Contribution

The novel Bi-VLA system integrates vision, language, and action modules for robotic manipulation, demonstrating effective real-world task execution and high success rates.

Findings

01

100% success in generating correct executable code

02

96.06% accuracy in ingredient detection

03

83.4% overall task success rate

Abstract

This research introduces the Bi-VLA (Vision-Language-Action) model, a novel system designed for bimanual robotic dexterous manipulation that seamlessly integrates vision for scene understanding, language comprehension for translating human instructions into executable code, and physical action generation. We evaluated the system's functionality through a series of household tasks, including the preparation of a desired salad upon human request. Bi-VLA demonstrates the ability to interpret complex human instructions, perceive and understand the visual context of ingredients, and execute precise bimanual actions to prepare the requested salad. We assessed the system's performance in terms of accuracy, efficiency, and adaptability to different salad recipes and human preferences through a series of experiments. Our results show a 100% success rate in generating the correct executable code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning