Multi-View Masked World Models for Visual Robotic Manipulation

Younggyo Seo; Junsu Kim; Stephen James; Kimin Lee; Jinwoo Shin; Pieter; Abbeel

arXiv:2302.02408·cs.RO·June 1, 2023·6 cites

Multi-View Masked World Models for Visual Robotic Manipulation

Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, Pieter, Abbeel

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a multi-view masked autoencoder that learns robust visual representations from multiple camera views, enabling effective robotic manipulation and policy transfer without camera calibration.

Contribution

We propose a multi-view masked autoencoder for learning representations from multi-view data, improving robotic manipulation and policy transfer in uncalibrated, viewpoint-randomized scenarios.

Findings

01

Effective multi-view control demonstrated

02

Robust policy transfer without camera calibration

03

Enhanced representation learning from multi-view data

Abstract

Visual robotic manipulation research and applications often use multiple cameras, or views, to better perceive the world. How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation. Specifically, we train a multi-view masked autoencoder which reconstructs pixels of randomly masked viewpoints and then learn a world model operating on the representations from the autoencoder. We demonstrate the effectiveness of our method in a range of scenarios, including multi-view control and single-view control with auxiliary cameras for representation learning. We also show that the multi-view masked autoencoder trained with multiple randomized viewpoints enables training a policy with strong viewpoint randomization and transferring the policy to solve real-robot tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

younggyoseo/MV-MWM
tfOfficial

Videos

Multi-View Masked World Models for Visual Robotic Manipulation· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Image Processing Techniques and Applications