Lang3D-XL: Language Embedded 3D Gaussians for Large-scale Scenes

Shai Krakovsky; Gal Fiebelman; Sagie Benaim; Hadar Averbuch-Elor

arXiv:2512.07807·cs.CV·December 9, 2025

Lang3D-XL: Language Embedded 3D Gaussians for Large-scale Scenes

Shai Krakovsky, Gal Fiebelman, Sagie Benaim, Hadar Averbuch-Elor

PDF

Open Access

TL;DR

Lang3D-XL introduces a novel language-embedded 3D Gaussian framework that enhances semantic understanding of large-scale scenes, enabling efficient natural language querying and editing by addressing semantic misalignment and computational challenges.

Contribution

It proposes a low-dimensional semantic bottleneck and hash encoding to improve efficiency and introduces regularizations to mitigate semantic misalignment in large-scale scene understanding.

Findings

01

Outperforms existing methods on HolyScenes dataset

02

Achieves better efficiency in runtime and memory usage

03

Enhances semantic alignment for natural language scene understanding

Abstract

Embedding a language field in a 3D representation enables richer semantic understanding of spatial environments by linking geometry with descriptive meaning. This allows for a more intuitive human-computer interaction, enabling querying or editing scenes using natural language, and could potentially improve tasks like scene retrieval, navigation, and multimodal reasoning. While such capabilities could be transformative, in particular for large-scale scenes, we find that recent feature distillation approaches cannot effectively learn over massive Internet data due to challenges in semantic feature misalignment and inefficiency in memory and runtime. To this end, we propose a novel approach to address these challenges. First, we introduce extremely low-dimensional semantic bottleneck features as part of the underlying 3D Gaussian representation. These are processed by rendering and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications