OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies
Runnan Chen, Xiangyu Sun, Zhaoqing Wang, Youquan Liu, Jiepeng Wang,, Lingdong Kong, Jiankang Deng, Mingming Gong, Liang Pan, Wenping Wang,, Tongliang Liu

TL;DR
OVGaussian introduces a novel 3D Gaussian-based framework for open-vocabulary scene understanding, leveraging a large-scale dataset and cross-modal learning to achieve strong generalization across scenes and views.
Contribution
It presents a new generalizable 3D semantic segmentation method using 3D Gaussians, along with a large-scale dataset and a cross-modal training framework.
Findings
Outperforms baseline methods in cross-scene and cross-domain tasks
Demonstrates robust generalization to novel views and scenes
Provides a large-scale annotated 3D scene dataset
Abstract
Open-vocabulary scene understanding using 3D Gaussian (3DGS) representations has garnered considerable attention. However, existing methods mostly lift knowledge from large 2D vision models into 3DGS on a scene-by-scene basis, restricting the capabilities of open-vocabulary querying within their training scenes so that lacking the generalizability to novel scenes. In this work, we propose \textbf{OVGaussian}, a generalizable \textbf{O}pen-\textbf{V}ocabulary 3D semantic segmentation framework based on the 3D \textbf{Gaussian} representation. We first construct a large-scale 3D scene dataset based on 3DGS, dubbed \textbf{SegGaussian}, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images. To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a 3D neural network to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
