Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets

Jeremiah Coholich; Justin Wit; Robert Azarcon; Zsolt Kira

arXiv:2601.09605·cs.CV·February 16, 2026

Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets

Jeremiah Coholich, Justin Wit, Robert Azarcon, Zsolt Kira

PDF

Open Access

TL;DR

This paper introduces MANGO, a novel unpaired image translation method that enhances viewpoint robustness in vision-based robot manipulation policies by effectively translating simulated images to real-world viewpoints with minimal real data.

Contribution

MANGO employs a segmentation-conditioned InfoNCE loss and a regularized discriminator to maintain viewpoint consistency during sim2real translation, improving policy robustness.

Findings

01

MANGO outperforms other image translation methods in diverse viewpoint translation.

02

Augmentation with MANGO increases success rates in real-world manipulation tasks by over 40%.

03

The method requires only a small amount of real-world fixed-camera data.

Abstract

Vision-based policies for robot manipulation have achieved significant recent success, but are still brittle to distribution shifts such as camera viewpoint variations. Robot demonstration data is scarce and often lacks appropriate variation in camera viewpoints. Simulation offers a way to collect robot demonstrations at scale with comprehensive coverage of different viewpoints, but presents a visual sim2real challenge. To bridge this gap, we propose MANGO -- an unpaired image translation method with a novel segmentation-conditioned InfoNCE loss, a highly-regularized discriminator design, and a modified PatchNCE loss. We find that these elements are crucial for maintaining viewpoint consistency during sim2real translation. When training MANGO, we only require a small amount of fixed-camera data from the real world, but show that our method can generate diverse unseen viewpoints by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning · Advanced Vision and Imaging