Sparse Code Uplifting for Efficient 3D Language Gaussian Splatting
Lovre Antonio Budimir, Yushi Guan, Steve Ryhner, Sven Lon\v{c}ari\'c, and Nandita Vijaykumar

TL;DR
SCOUP introduces a sparse codebook approach for 3D Gaussian Splatting that significantly improves training speed, memory efficiency, and maintains high-quality open-vocabulary 3D scene understanding.
Contribution
It decouples language representation learning from 3D Gaussian optimization using 2D region-based sparse codes, enabling fast, efficient, and accurate 3D scene understanding.
Findings
Achieves up to 400x faster training speed.
Uses 3x less memory during training.
Matches or outperforms existing methods in open-vocabulary accuracy.
Abstract
3D Language Gaussian Splatting (3DLGS) augments 3D Gaussian Splatting with language-aligned visual features for open-vocabulary 3D scene understanding. A core challenge is efficiently associating high-dimensional vision-language embeddings with millions of 3D Gaussians while preserving efficient feature rendering for text-based querying. Existing methods either store dense features directly on Gaussians, causing high storage costs and slow rendering, or learn compact representations through expensive per-scene optimization with repeated feature rasterization. No existing method simultaneously achieves fast 3D semantic reconstruction, efficient storage, and fast rendering. We propose SCOUP (Sparse COde UPlifting), which addresses all three by decoupling language representation learning from 3D Gaussian optimization. Rather than working directly in 3D, we learn sparse codebook-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
