TL;DR
This paper introduces GenWildSplat, a real-time, feed-forward framework for 3D reconstruction from sparse, unposed images that generalizes across diverse outdoor scenes without per-scene optimization.
Contribution
It presents a novel, generalizable approach for outdoor 3D reconstruction from internet images that avoids scene-specific training and optimization.
Findings
Achieves state-of-the-art feed-forward rendering quality.
Operates in real-time without test-time optimization.
Successfully generalizes across diverse illumination and occlusion conditions.
Abstract
Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusions. Existing methods rely on scene-specific optimization using appearance embeddings or dynamic masks, which requires extensive per-scene training and fails under sparse views. Moreover, evaluations on limited scenes raise questions about generalization. We present GenWildSplat, a feed-forward framework for sparse-view outdoor reconstruction that requires no per-scene optimization. Given unposed internet images, GenWildSplat predicts depth, camera parameters, and 3D Gaussians in a canonical space using learned geometric priors. An appearance adapter modulates appearance for target lighting conditions, while semantic segmentation handles transient objects. Through curriculum learning on synthetic and real data, GenWildSplat generalizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
