Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation
Tri Ton, Ji Woo Hong, SooHwan Eom, Jun Yeop Shim, Junyeong Kim, Chang, D. Yoo

TL;DR
This paper introduces a zero-shot dual-path framework combining 3D point cloud and 2D multi-view image data to improve open-vocabulary 3D instance segmentation, effectively identifying both seen and unseen objects.
Contribution
It proposes a novel dual-path integration framework that leverages pre-trained models in a zero-shot setting, enhancing segmentation of diverse objects beyond traditional methods.
Findings
Outperforms existing methods on ScanNet200 dataset
Effectively identifies unseen object categories
Demonstrates robustness across different datasets
Abstract
Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask proposal models; consequently, the information that could come from 2D association to 3D was not fully exploited. This bias towards 3D data, while effective for familiar indoor objects, limits the system's adaptability to new and varied object types, where 2D models offer greater utility. Addressing this gap, we introduce Zero-Shot Dual-Path Integration Framework that equally values the contributions of both 3D and 2D modalities. Our framework comprises three components: 3D pathway, 2D pathway, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Handwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
