View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

Stephen Tian; Blake Wulfe; Kyle Sargent; Katherine Liu; Sergey Zakharov; Vitor Guizilini; Jiajun Wu

arXiv:2409.03685·cs.RO·June 3, 2025

View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

Stephen Tian, Blake Wulfe, Kyle Sargent, Katherine Liu, Sergey Zakharov, Vitor Guizilini, Jiajun Wu

PDF

Open Access 1 Datasets 3 Reviews

TL;DR

This paper introduces a method for learning view-invariant manipulation policies by leveraging zero-shot novel view synthesis models, enabling robots to generalize across different viewpoints using minimal data.

Contribution

It proposes View Synthesis Augmentation (VISTA), a novel data-augmentation scheme that uses zero-shot view synthesis models to improve viewpoint-invariant policy learning from single-view demonstrations.

Findings

01

Policies trained with VISTA outperform baselines in diverse tasks

02

Viewpoint robustness improves in both simulated and real-world settings

03

Zero-shot view synthesis enables generalization to unseen viewpoints

Abstract

Large-scale visuomotor policy learning is a promising approach toward developing generalizable manipulation systems. Yet, policies that can be deployed on diverse embodiments, environments, and observational modalities remain elusive. In this work, we investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint. Specifically, we study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints given a single input image. For practical application to diverse robotic data, these models must operate zero-shot, performing view synthesis on unseen tasks and environments. We empirically analyze view synthesis models within a simple data-augmentation scheme that we call View Synthesis…

Peer Reviews

Decision·CoRL 2024

Reviewer 01Rating 3Confidence 3

Reviewer 02Rating 3Confidence 4

Reviewer 03Rating 3Confidence 4

Code & Models

Datasets

s-tian/VISTA_Data
dataset· 72 dl
72 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Internet Traffic Analysis and Secure E-voting · Text and Document Classification Technologies