SparseLGS: Sparse View Language Embedded Gaussian Splatting

Jun Hu; Zhang Chen; Zhong Li; Yi Xu; Juyong Zhang

arXiv:2412.02245·cs.CV·December 5, 2024

SparseLGS: Sparse View Language Embedded Gaussian Splatting

Jun Hu, Zhang Chen, Zhong Li, Yi Xu, Juyong Zhang

PDF

Open Access

TL;DR

SparseLGS introduces a novel approach for 3D scene understanding using sparse, pose-free images, leveraging a learning-based stereo model and region matching to achieve high-quality semantic reconstructions with fewer inputs and faster computation.

Contribution

It is the first to address 3D semantic field reconstruction with sparse, pose-free views, improving efficiency and reducing input requirements compared to prior dense-view methods.

Findings

01

Achieves comparable semantic reconstruction quality with 3-4 sparse views.

02

Significantly improves computation speed by 5 times.

03

Outperforms previous state-of-the-art methods with fewer inputs.

Abstract

Recently, several studies have combined Gaussian Splatting to obtain scene representations with language embeddings for open-vocabulary 3D scene understanding. While these methods perform well, they essentially require very dense multi-view inputs, limiting their applicability in real-world scenarios. In this work, we propose SparseLGS to address the challenge of 3D scene understanding with pose-free and sparse view input images. Our method leverages a learning-based dense stereo model to handle pose-free and sparse inputs, and a three-step region matching approach to address the multi-view semantic inconsistency problem, which is especially important for sparse inputs. Different from directly learning high-dimensional CLIP features, we extract low-dimensional information and build bijections to avoid excessive learning and storage costs. We introduce a reconstruction loss during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods · Advanced Clustering Algorithms Research

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Contrastive Language-Image Pre-training