OpenGaFF: Open-Vocabulary Gaussian Feature Field with Codebook Attention
Kunyi Li, Michael Niemeyer, Sen Wang, Stefano Gasperini, Nassir Navab, Federico Tombari

TL;DR
OpenGaFF is a novel 3D scene understanding framework that models semantics as a continuous function of geometry and appearance, improving open-vocabulary reasoning and spatial coherence in 3D scenes.
Contribution
The paper introduces a Gaussian Feature Field with a structured codebook and codebook-guided attention, enhancing open-vocabulary 3D scene understanding with improved semantic consistency.
Findings
Outperforms prior methods on standard benchmarks
Achieves better segmentation quality and semantic consistency
Provides a semantically interpretable learned representation
Abstract
Understanding open-vocabulary 3D scenes with Gaussian-based representations remains challenging due to fragmented and spatially inconsistent semantic predictions across multi-view observations. In this paper, we present OpenGaFF, a novel framework for open-vocabulary 3D scene understanding built upon 3D Gaussian Splatting. At the core of our method is a Gaussian Feature Field that models semantics as a continuous function of Gaussian geometry and appearance. By explicitly conditioning semantic predictions on geometric structure, this formulation strengthens the coupling between geometry and semantics, leading to improved spatial coherence across similar structures in 3D space. To further enforce object-level semantic consistency, we introduce a structured codebook that serves as a set of shared semantic primitives. Furthermore, a codebook-guided attention mechanism is proposed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
