What do we learn from a large-scale study of pre-trained visual   representations in sim and real environments?

Sneha Silwal; Karmesh Yadav; Tingfan Wu; Jay Vakil; Arjun Majumdar,; Sergio Arnaud; Claire Chen; Vincent-Pierre Berges; Dhruv Batra; Aravind; Rajeswaran; Mrinal Kalakrishnan; Franziska Meier; Oleksandr Maksymets

arXiv:2310.02219·cs.RO·July 16, 2024

What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?

Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar,, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind, Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets

PDF

Open Access

TL;DR

This large-scale empirical study evaluates five pre-trained visual representations across multiple tasks, robots, and learning paradigms, revealing insights into their transferability from simulation to real-world applications.

Contribution

It provides the first comprehensive analysis of PVRs in real-world tasks, demonstrating their transferability and the impact of data augmentation and fine-tuning.

Findings

01

Simulation performance trends predict real-world outcomes.

02

Achieved zero-shot transfer in indoor ImageNav.

03

Data augmentation and fine-tuning improve real-world performance.

Abstract

We present a large empirical investigation on the use of pre-trained visual representations (PVRs) for training downstream policies that execute real-world tasks. Our study involves five different PVRs, each trained for five distinct manipulation or indoor navigation tasks. We performed this evaluation using three different robots and two different policy learning paradigms. From this effort, we can arrive at three insights: 1) the performance trends of PVRs in the simulation are generally indicative of their trends in the real world, 2) the use of PVRs enables a first-of-its-kind result with indoor ImageNav (zero-shot transfer to a held-out scene in the real world), and 3) the benefits from variations in PVRs, primarily data-augmentation and fine-tuning, also transfer to the real-world performance. See project website for additional details and visuals.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning