SLGaussian: Fast Language Gaussian Splatting in Sparse Views
Kangjie Chen, BingQuan Dai, Minghan Qin, Dongbin Zhang, Peihao Li, Yingshuang Zou, Haoqian Wang

TL;DR
SLGaussian is a fast, feed-forward method that constructs 3D semantic fields from sparse viewpoints, enabling efficient and accurate 3D scene understanding with language integration, outperforming existing approaches.
Contribution
It introduces SLGaussian, a novel approach that embeds language into 3D space efficiently from sparse views, avoiding costly per-scene optimization.
Findings
Outperforms existing methods in IoU, localization, and mIoU.
Scene inference time is under 30 seconds.
Open-vocabulary querying takes only 0.011 seconds per query.
Abstract
3D semantic field learning is crucial for applications like autonomous navigation, AR/VR, and robotics, where accurate comprehension of 3D scenes from limited viewpoints is essential. Existing methods struggle under sparse view conditions, relying on inefficient per-scene multi-view optimizations, which are impractical for many real-world tasks. To address this, we propose SLGaussian, a feed-forward method for constructing 3D semantic fields from sparse viewpoints, allowing direct inference of 3DGS-based scenes. By ensuring consistent SAM segmentations through video tracking and using low-dimensional indexing for high-dimensional CLIP features, SLGaussian efficiently embeds language information in 3D space, offering a robust solution for accurate 3D scene understanding under sparse view conditions. In experiments on two-view sparse 3D object querying and segmentation in the LERF and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Speech Recognition and Synthesis · Machine Learning and Data Classification
MethodsSegment Anything Model · Contrastive Language-Image Pre-training
