Gen-LangSplat: Generalized Language Gaussian Splatting with Pre-Trained Feature Compression

Pranav Saxena

arXiv:2510.22930·cs.CV·October 28, 2025

Gen-LangSplat: Generalized Language Gaussian Splatting with Pre-Trained Feature Compression

Pranav Saxena

PDF

TL;DR

Gen-LangSplat introduces a scalable, scene-agnostic approach for 3D language field modeling by replacing scene-specific autoencoders with a pre-trained generalized autoencoder, enabling efficient open-vocabulary querying across diverse environments.

Contribution

It proposes a generalized autoencoder trained on large-scale data, removing the need for scene-specific training in 3D language field construction.

Findings

01

Achieves comparable or better querying performance than scene-specific methods.

02

Demonstrates efficient, scalable 3D language modeling without scene-specific training.

03

Validates the effectiveness of fixed latent space for open-vocabulary querying.

Abstract

Modeling open-vocabulary language fields in 3D is essential for intuitive human-AI interaction and querying within physical environments. State-of-the-art approaches, such as LangSplat, leverage 3D Gaussian Splatting to efficiently construct these language fields, encoding features distilled from high-dimensional models like CLIP. However, this efficiency is currently offset by the requirement to train a scene-specific language autoencoder for feature compression, introducing a costly, per-scene optimization bottleneck that hinders deployment scalability. In this work, we introduce Gen-LangSplat, that eliminates this requirement by replacing the scene-wise autoencoder with a generalized autoencoder, pre-trained extensively on the large-scale ScanNet dataset. This architectural shift enables the use of a fixed, compact latent space for language features across any new scene without any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.