Scene Grounding In the Wild
Tamir Cohen, Leo Segre, Shay Shomer-Chai, Shai Avidan, Hadar Averbuch-Elor

TL;DR
This paper introduces a novel scene grounding framework that aligns partial 3D reconstructions with complete reference models derived from Google Earth, improving global consistency in large-scale scene reconstruction.
Contribution
The work presents a new method for aligning partial scene reconstructions to full reference models using semantic-aware 3D Gaussian Splatting and inverse feature optimization, addressing domain gaps.
Findings
Improved global alignment of partial reconstructions with reference models.
Mitigates failure modes of existing end-to-end reconstruction models.
Introduces the WikiEarth dataset for benchmarking scene grounding methods.
Abstract
Reconstructing accurate 3D models of large-scale real-world scenes from unstructured, in-the-wild imagery remains a core challenge in computer vision, especially when the input views have little or no overlap. In such cases, existing reconstruction pipelines often produce multiple disconnected partial reconstructions or erroneously merge non-overlapping regions into overlapping geometry. In this work, we propose a framework that grounds each partial reconstruction to a complete reference model of the scene, enabling globally consistent alignment even in the absence of visual overlap. We obtain reference models from dense, geospatially accurate pseudo-synthetic renderings derived from Google Earth Studio. These renderings provide full scene coverage but differ substantially in appearance from real-world photographs. Our key insight is that, despite this significant domain gap, both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
