Zero-Shot Visual Generalization in Robot Manipulation
Sumeet Batra, Gaurav Sukhatme

TL;DR
This paper advances zero-shot visual generalization in robot manipulation by scaling disentangled representations and associative memory to complex tasks, enabling robustness to visual and camera perturbations in simulation and real-world settings.
Contribution
It introduces a scalable approach combining disentangled representations, associative memory, and model equivariance to improve visual robustness and zero-shot adaptability in manipulation policies.
Findings
Significant improvement in visual generalization over state-of-the-art methods.
Successful zero-shot adaptation to visual perturbations in simulation and hardware.
Enhanced robustness to camera rotations through a novel invariance technique.
Abstract
Training vision-based manipulation policies that are robust across diverse visual environments remains an important and unresolved challenge in robot learning. Current approaches often sidestep the problem by relying on invariant representations such as point clouds and depth, or by brute-forcing generalization through visual domain randomization and/or large, visually diverse datasets. Disentangled representation learning - especially when combined with principles of associative memory - has recently shown promise in enabling vision-based reinforcement learning policies to be robust to visual distribution shifts. However, these techniques have largely been constrained to simpler benchmarks and toy environments. In this work, we scale disentangled representation learning and associative memory to more visually and dynamically complex manipulation tasks and demonstrate zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
