RoboUniView: Visual-Language Model with Unified View Representation for   Robotic Manipulation

Fanfan Liu; Feng Yan; Liming Zheng; Chengjian Feng; Yiyang Huang; Lin; Ma

arXiv:2406.18977·cs.RO·September 13, 2024

RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation

Fanfan Liu, Feng Yan, Liming Zheng, Chengjian Feng, Yiyang Huang, Lin, Ma

PDF

Open Access 1 Repo

TL;DR

RoboUniView introduces a unified view representation for robotic manipulation that improves generalization across different camera setups and enhances performance on the CALVIN benchmark by decoupling visual features from action learning.

Contribution

The paper proposes RoboUniView, a novel approach that learns a unified view representation from multiple perspectives, enabling better generalization and platform independence in robotic manipulation tasks.

Findings

01

Achieves state-of-the-art success rates on CALVIN benchmark.

02

Maintains high performance under unseen camera parameters.

03

Supports joint cross-task learning across datasets.

Abstract

Utilizing Vision-Language Models (VLMs) for robotic manipulation represents a novel paradigm, aiming to enhance the model's ability to generalize to new objects and instructions. However, due to variations in camera specifications and mounting positions, existing methods exhibit significant performance disparities across different robotic platforms. To address this challenge, we propose RoboUniView in this paper, an innovative approach that decouples visual feature extraction from action learning. We first learn a unified view representation from multi-perspective views by pre-training on readily accessible data, and then derive actions from this unified view representation to control robotic manipulation. This unified view representation more accurately mirrors the physical world and is not constrained by the robotic platform's camera parameters. Thanks to this methodology, we achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liufanfanlff/robouniview
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques