Bridging the Sim2Real Gap: Vision Encoder Pre-Training for Visuomotor Policy Transfer
Yash Yardi, Samuel Biruduganti, Lars Ankile

TL;DR
This paper evaluates how large-scale pre-trained vision encoders can improve the transfer of visuomotor policies from simulation to real-world robotic tasks by analyzing their feature extraction and invariance properties.
Contribution
It introduces an offline evaluation framework for vision encoders assessing their suitability for Sim2Real transfer, revealing key architectural and pre-training factors that influence transfer success.
Findings
Manipulation-pretrained encoders outperform others in Action Score
CNN-based encoders exhibit stronger domain invariance than ViTs
Combining properties of different encoders enhances transferability
Abstract
Simulation offers a scalable and efficient alternative to real-world data collection for learning visuomotor robotic policies. However, the simulation-to-reality, or Sim2Real distribution shift -- introduced by employing simulation-trained policies in real-world environments -- frequently prevents successful policy transfer. We present an offline framework to evaluate the performance of using large-scale pre-trained vision encoders to address the Sim2Real gap. We examine a diverse collection of encoders, assessing their ability to extract features necessary for robot control (Action Score) while remaining invariant to task-irrelevant environmental variations (Domain Invariance Score). Evaluating 23 encoders, we reveal patterns across architectures, pre-training datasets, and parameter scales. Our findings show that manipulation-pretrained encoders consistently achieve higher Action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces
