LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding
Hao Li, Minghan Qin, Zhengyu Zou, Diqi He, Xinhao Ji, Bohan Li, Bingquan Dai, Dingewn Zhang, Junwei Han

TL;DR
LangSurf introduces a novel 3D language field alignment method using surface Gaussian representations, significantly improving 3D segmentation and editing tasks with better object localization and contextual understanding.
Contribution
The paper proposes a new Language-Embedded Surface Field (LangSurf) with joint training and hierarchical context modules for precise 3D language alignment and segmentation.
Findings
Outperforms previous state-of-the-art in 2D and 3D semantic segmentation.
Enables accurate 3D object segmentation and editing with text queries.
Demonstrates significant improvements in open-vocabulary recognition tasks.
Abstract
Applying Gaussian Splatting to perception tasks for 3D scene understanding is becoming increasingly popular. Most existing works primarily focus on rendering 2D feature maps from novel viewpoints, which leads to an imprecise 3D language field with outlier languages, ultimately failing to align objects in 3D space. By utilizing masked images for feature extraction, these approaches also lack essential contextual information, leading to inaccurate feature representation. To this end, we propose a Language-Embedded Surface Field (LangSurf), which accurately aligns the 3D language fields with the surface of objects, facilitating precise 2D and 3D segmentation with text query, widely expanding the downstream tasks such as removal and editing. The core of LangSurf is a joint training strategy that flattens the language Gaussian on the object surfaces using geometry supervision and contrastive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Computer Graphics and Visualization Techniques
MethodsSegment Anything Model · Focus · ALIGN
