View-Invariant Policy Learning via Zero-Shot Novel View Synthesis
Stephen Tian, Blake Wulfe, Kyle Sargent, Katherine Liu, Sergey Zakharov, Vitor Guizilini, Jiajun Wu

TL;DR
This paper introduces a method for learning view-invariant manipulation policies by leveraging zero-shot novel view synthesis models, enabling robots to generalize across different viewpoints using minimal data.
Contribution
It proposes View Synthesis Augmentation (VISTA), a novel data-augmentation scheme that uses zero-shot view synthesis models to improve viewpoint-invariant policy learning from single-view demonstrations.
Findings
Policies trained with VISTA outperform baselines in diverse tasks
Viewpoint robustness improves in both simulated and real-world settings
Zero-shot view synthesis enables generalization to unseen viewpoints
Abstract
Large-scale visuomotor policy learning is a promising approach toward developing generalizable manipulation systems. Yet, policies that can be deployed on diverse embodiments, environments, and observational modalities remain elusive. In this work, we investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint. Specifically, we study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints given a single input image. For practical application to diverse robotic data, these models must operate zero-shot, performing view synthesis on unseen tasks and environments. We empirically analyze view synthesis models within a simple data-augmentation scheme that we call View Synthesis…
Peer Reviews
Decision·CoRL 2024
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Internet Traffic Analysis and Secure E-voting · Text and Document Classification Technologies
