Taking Language Embedded 3D Gaussian Splatting into the Wild

Yuze Wang; Yue Qi

arXiv:2507.19830·cs.GR·August 6, 2025

Taking Language Embedded 3D Gaussian Splatting into the Wild

Yuze Wang, Yue Qi

PDF

TL;DR

This paper introduces a novel framework extending language embedded 3D Gaussian splatting for open-vocabulary scene understanding from unconstrained photo collections, enabling immersive 3D architectural analysis and editing.

Contribution

It proposes a new method combining multi-appearance features, uncertainty maps, and a transient uncertainty-aware autoencoder for improved 3D scene understanding and segmentation.

Findings

01

Outperforms existing methods in open-vocabulary segmentation

02

Enables interactive 3D scene exploration and editing

03

Introduces PT-OVS benchmark dataset for evaluation

Abstract

Recent advances in leveraging large-scale Internet photo collections for 3D reconstruction have enabled immersive virtual exploration of landmarks and historic sites worldwide. However, little attention has been given to the immersive understanding of architectural styles and structural knowledge, which remains largely confined to browsing static text-image pairs. Therefore, can we draw inspiration from 3D in-the-wild reconstruction techniques and use unconstrained photo collections to create an immersive approach for understanding the 3D structure of architectural components? To this end, we extend language embedded 3D Gaussian splatting (3DGS) and propose a novel framework for open-vocabulary scene understanding from unconstrained photo collections. Specifically, we first render multiple appearance images from the same viewpoint as the unconstrained image with the reconstructed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.