Taking Language Embedded 3D Gaussian Splatting into the Wild
Yuze Wang, Yue Qi

TL;DR
This paper introduces a novel framework extending language embedded 3D Gaussian splatting for open-vocabulary scene understanding from unconstrained photo collections, enabling immersive 3D architectural analysis and editing.
Contribution
It proposes a new method combining multi-appearance features, uncertainty maps, and a transient uncertainty-aware autoencoder for improved 3D scene understanding and segmentation.
Findings
Outperforms existing methods in open-vocabulary segmentation
Enables interactive 3D scene exploration and editing
Introduces PT-OVS benchmark dataset for evaluation
Abstract
Recent advances in leveraging large-scale Internet photo collections for 3D reconstruction have enabled immersive virtual exploration of landmarks and historic sites worldwide. However, little attention has been given to the immersive understanding of architectural styles and structural knowledge, which remains largely confined to browsing static text-image pairs. Therefore, can we draw inspiration from 3D in-the-wild reconstruction techniques and use unconstrained photo collections to create an immersive approach for understanding the 3D structure of architectural components? To this end, we extend language embedded 3D Gaussian splatting (3DGS) and propose a novel framework for open-vocabulary scene understanding from unconstrained photo collections. Specifically, we first render multiple appearance images from the same viewpoint as the unconstrained image with the reconstructed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
