Visual Reinforcement Learning with Self-Supervised 3D Representations
Yanjie Ze, Nicklas Hansen, Yinbo Chen, Mohit Jain, Xiaolong Wang

TL;DR
This paper introduces a 3D self-supervised learning framework for visual reinforcement learning that improves sample efficiency and enables zero-shot transfer to real robots in manipulation tasks.
Contribution
It presents a novel two-phase framework using a voxel-based 3D autoencoder pretrained on object data, enhancing RL performance and transferability.
Findings
Improved sample efficiency in simulated manipulation tasks.
Zero-shot transfer to real robot with uncalibrated RGB camera.
Successful grasping and lifting in real-world experiments.
Abstract
A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal and inductive biases. However, while the real world is inherently 3D, prior efforts have largely been focused on leveraging 2D computer vision techniques as auxiliary self-supervision. In this work, we present a unified framework for self-supervised learning of 3D representations for motor control. Our proposed framework consists of two phases: a pretraining phase where a deep voxel-based 3D autoencoder is pretrained on a large object-centric dataset, and a finetuning phase where the representation is jointly finetuned together with RL on in-domain data. We empirically show that our method enjoys improved sample efficiency in simulated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Robot Manipulation and Learning
