Bridging the Sim2Real Gap: Vision Encoder Pre-Training for Visuomotor Policy Transfer

Yash Yardi; Samuel Biruduganti; Lars Ankile

arXiv:2501.16389·cs.RO·September 9, 2025

Bridging the Sim2Real Gap: Vision Encoder Pre-Training for Visuomotor Policy Transfer

Yash Yardi, Samuel Biruduganti, Lars Ankile

PDF

Open Access 1 Repo

TL;DR

This paper evaluates how large-scale pre-trained vision encoders can improve the transfer of visuomotor policies from simulation to real-world robotic tasks by analyzing their feature extraction and invariance properties.

Contribution

It introduces an offline evaluation framework for vision encoders assessing their suitability for Sim2Real transfer, revealing key architectural and pre-training factors that influence transfer success.

Findings

01

Manipulation-pretrained encoders outperform others in Action Score

02

CNN-based encoders exhibit stronger domain invariance than ViTs

03

Combining properties of different encoders enhances transferability

Abstract

Simulation offers a scalable and efficient alternative to real-world data collection for learning visuomotor robotic policies. However, the simulation-to-reality, or Sim2Real distribution shift -- introduced by employing simulation-trained policies in real-world environments -- frequently prevents successful policy transfer. We present an offline framework to evaluate the performance of using large-scale pre-trained vision encoders to address the Sim2Real gap. We examine a diverse collection of encoders, assessing their ability to extract features necessary for robot control (Action Score) while remaining invariant to task-irrelevant environmental variations (Domain Invariance Score). Evaluating 23 encoders, we reveal patterns across architectures, pre-training datasets, and parameter scales. Our findings show that manipulation-pretrained encoders consistently achieve higher Action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yyardi/bridging-the-sim2real-gap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces