VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual   Manipulation

I-Chun Arthur Liu; Sicheng He; Daniel Seita; Gaurav Sukhatme

arXiv:2407.04152·cs.RO·October 8, 2024

VoxAct-B: Voxel-Based Acting and Stabilizing Policy for Bimanual Manipulation

I-Chun Arthur Liu, Sicheng He, Daniel Seita, Gaurav Sukhatme

PDF

Open Access 1 Repo 1 Datasets

TL;DR

VoxAct-B introduces a voxel-based, language-conditioned policy leveraging Vision Language Models to improve efficiency and generalization in bimanual robotic manipulation tasks, demonstrated in simulation and real-world scenarios.

Contribution

It presents VoxAct-B, a novel voxel-based approach that integrates language cues and vision models for more effective bimanual manipulation learning.

Findings

01

Outperforms baselines in simulation tasks

02

Effective on real-world Open Drawer and Open Jar tasks

03

Demonstrates improved generalization across tasks

Abstract

Bimanual manipulation is critical to many robotics applications. In contrast to single-arm manipulation, bimanual manipulation tasks are challenging due to higher-dimensional action spaces. Prior works leverage large amounts of data and primitive actions to address this problem, but may suffer from sample inefficiency and limited generalization across various tasks. To this end, we propose VoxAct-B, a language-conditioned, voxel-based method that leverages Vision Language Models (VLMs) to prioritize key regions within the scene and reconstruct a voxel grid. We provide this voxel grid to our bimanual manipulation policy to learn acting and stabilizing actions. This approach enables more efficient policy learning from voxels and is generalizable to different tasks. In simulation, we show that VoxAct-B outperforms strong baselines on fine-grained bimanual manipulation tasks. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VoxAct-B/voxactb
pytorchOfficial

Datasets

arthur801031/voxact-b
dataset· 13 dl
13 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeuroscience and Neural Engineering