SparseLGS: Sparse View Language Embedded Gaussian Splatting
Jun Hu, Zhang Chen, Zhong Li, Yi Xu, Juyong Zhang

TL;DR
SparseLGS introduces a novel approach for 3D scene understanding using sparse, pose-free images, leveraging a learning-based stereo model and region matching to achieve high-quality semantic reconstructions with fewer inputs and faster computation.
Contribution
It is the first to address 3D semantic field reconstruction with sparse, pose-free views, improving efficiency and reducing input requirements compared to prior dense-view methods.
Findings
Achieves comparable semantic reconstruction quality with 3-4 sparse views.
Significantly improves computation speed by 5 times.
Outperforms previous state-of-the-art methods with fewer inputs.
Abstract
Recently, several studies have combined Gaussian Splatting to obtain scene representations with language embeddings for open-vocabulary 3D scene understanding. While these methods perform well, they essentially require very dense multi-view inputs, limiting their applicability in real-world scenarios. In this work, we propose SparseLGS to address the challenge of 3D scene understanding with pose-free and sparse view input images. Our method leverages a learning-based dense stereo model to handle pose-free and sparse inputs, and a three-step region matching approach to address the multi-view semantic inconsistency problem, which is especially important for sparse inputs. Different from directly learning high-dimensional CLIP features, we extract low-dimensional information and build bijections to avoid excessive learning and storage costs. We introduce a reconstruction loss during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods · Advanced Clustering Algorithms Research
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Contrastive Language-Image Pre-training
