OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary   Understanding

Yanmin Wu; Jiarui Meng; Haijie Li; Chenming Wu; Yahao Shi; Xinhua; Cheng; Chen Zhao; Haocheng Feng; Errui Ding; Jingdong Wang; Jian Zhang

arXiv:2406.02058·cs.CV·December 9, 2024·2 cites

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua, Cheng, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Jian Zhang

PDF

Open Access

TL;DR

OpenGaussian introduces a novel 3D Gaussian-based approach for point-level open vocabulary understanding, improving feature expressiveness and 3D-2D feature association for robust 3D scene comprehension.

Contribution

It proposes a two-stage feature discretization and an instance-level 3D-2D feature association method for enhanced 3D point-level open vocabulary understanding.

Findings

01

Effective 3D object selection and understanding demonstrated

02

Improved 3D-2D feature association accuracy

03

Robustness in open vocabulary 3D scene understanding

Abstract

This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. To ensure robust feature presentation and 3D point-level understanding, we first employ SAM masks without cross-frame associations to train instance features with 3D consistency. These features exhibit both intra-object consistency and inter-object distinction. Then, we propose a two-stage codebook to discretize these features from coarse to fine levels. At the coarse level, we consider the positional information of 3D points to achieve location-based clustering, which is then refined at the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsFocus · Contrastive Language-Image Pre-training · Segment Anything Model