OVGaussian: Generalizable 3D Gaussian Segmentation with Open   Vocabularies

Runnan Chen; Xiangyu Sun; Zhaoqing Wang; Youquan Liu; Jiepeng Wang,; Lingdong Kong; Jiankang Deng; Mingming Gong; Liang Pan; Wenping Wang,; Tongliang Liu

arXiv:2501.00326·cs.CV·January 3, 2025

OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies

Runnan Chen, Xiangyu Sun, Zhaoqing Wang, Youquan Liu, Jiepeng Wang,, Lingdong Kong, Jiankang Deng, Mingming Gong, Liang Pan, Wenping Wang,, Tongliang Liu

PDF

Open Access

TL;DR

OVGaussian introduces a novel 3D Gaussian-based framework for open-vocabulary scene understanding, leveraging a large-scale dataset and cross-modal learning to achieve strong generalization across scenes and views.

Contribution

It presents a new generalizable 3D semantic segmentation method using 3D Gaussians, along with a large-scale dataset and a cross-modal training framework.

Findings

01

Outperforms baseline methods in cross-scene and cross-domain tasks

02

Demonstrates robust generalization to novel views and scenes

03

Provides a large-scale annotated 3D scene dataset

Abstract

Open-vocabulary scene understanding using 3D Gaussian (3DGS) representations has garnered considerable attention. However, existing methods mostly lift knowledge from large 2D vision models into 3DGS on a scene-by-scene basis, restricting the capabilities of open-vocabulary querying within their training scenes so that lacking the generalizability to novel scenes. In this work, we propose \textbf{OVGaussian}, a generalizable \textbf{O}pen-\textbf{V}ocabulary 3D semantic segmentation framework based on the 3D \textbf{Gaussian} representation. We first construct a large-scale 3D scene dataset based on 3DGS, dubbed \textbf{SegGaussian}, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images. To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a 3D neural network to learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques