Zero-Shot Visual Generalization in Robot Manipulation

Sumeet Batra; Gaurav Sukhatme

arXiv:2505.11719·cs.RO·May 20, 2025

Zero-Shot Visual Generalization in Robot Manipulation

Sumeet Batra, Gaurav Sukhatme

PDF

Open Access

TL;DR

This paper advances zero-shot visual generalization in robot manipulation by scaling disentangled representations and associative memory to complex tasks, enabling robustness to visual and camera perturbations in simulation and real-world settings.

Contribution

It introduces a scalable approach combining disentangled representations, associative memory, and model equivariance to improve visual robustness and zero-shot adaptability in manipulation policies.

Findings

01

Significant improvement in visual generalization over state-of-the-art methods.

02

Successful zero-shot adaptation to visual perturbations in simulation and hardware.

03

Enhanced robustness to camera rotations through a novel invariance technique.

Abstract

Training vision-based manipulation policies that are robust across diverse visual environments remains an important and unresolved challenge in robot learning. Current approaches often sidestep the problem by relying on invariant representations such as point clouds and depth, or by brute-forcing generalization through visual domain randomization and/or large, visually diverse datasets. Disentangled representation learning - especially when combined with principles of associative memory - has recently shown promise in enabling vision-based reinforcement learning policies to be robust to visual distribution shifts. However, these techniques have largely been constrained to simpler benchmarks and toy environments. In this work, we scale disentangled representation learning and associative memory to more visually and dynamically complex manipulation tasks and demonstrate zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion