Efficient Camera Pose Augmentation for View Generalization in Robotic Policy Learning
Sen Wang, Huaiyi Dong, Jingyi Tian, Jiayi Li, Zhuo Yang, Tongtong Cao, Anlin Chen, Shuang Wu, Le Wang, and Sanping Zhou

TL;DR
This paper introduces GenSplat, a 3D Gaussian Splatting framework that enhances view generalization in robotic policies by rendering diverse synthetic views from sparse inputs, improving robustness to spatial perturbations.
Contribution
The paper presents a novel 3D Gaussian Splatting method with a permutation-equivariant architecture and 3D-prior regularization for improved view-generalized policy learning.
Findings
GenSplat enables high-fidelity 3D scene reconstruction from sparse inputs.
Synthetic view augmentation improves policy robustness under spatial perturbations.
The approach outperforms baselines in generalization to unseen views.
Abstract
Prevailing 2D-centric visuomotor policies exhibit a pronounced deficiency in novel view generalization, as their reliance on static observations hinders consistent action mapping across unseen views. In response, we introduce GenSplat, a feed-forward 3D Gaussian Splatting framework that facilitates view-generalized policy learning through novel view rendering. GenSplat employs a permutation-equivariant architecture to reconstruct high-fidelity 3D scenes from sparse, uncalibrated inputs in a single forward pass. To ensure structural integrity, we design a 3D-prior distillation strategy that regularizes the 3DGS optimization, preventing the geometric collapse typical of purely photometric supervision. By rendering diverse synthetic views from these stable 3D representations, we systematically augment the observational manifold during training. This augmentation forces the policy to ground…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
