TL;DR
This paper introduces LaGa, a novel method for 3D scene understanding that effectively captures view-dependent semantics by decomposing scenes into objects and aggregating multi-view semantic information, significantly improving accuracy.
Contribution
LaGa is the first approach to explicitly model view-dependent semantics in 3D Gaussian Splatting by decomposing scenes and aggregating multi-view semantic features.
Findings
LaGa achieves +18.7% mIoU over SOTA on LERF-OVS dataset.
It effectively captures view-dependent semantics for better scene understanding.
LaGa outperforms previous methods in 3D semantic segmentation accuracy.
Abstract
Recent advancements in 3D Gaussian Splatting (3D-GS) enable high-quality 3D scene reconstruction from RGB images. Many studies extend this paradigm for language-driven open-vocabulary scene understanding. However, most of them simply project 2D semantic features onto 3D Gaussians and overlook a fundamental gap between 2D and 3D understanding: a 3D object may exhibit various semantics from different viewpoints--a phenomenon we term view-dependent semantics. To address this challenge, we propose LaGa (Language Gaussians), which establishes cross-view semantic connections by decomposing the 3D scene into objects. Then, it constructs view-aggregated semantic representations by clustering semantic descriptors and reweighting them based on multi-view semantics. Extensive experiments demonstrate that LaGa effectively captures key information from view-dependent semantics, enabling a more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
