Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

Phuc D.A. Nguyen; Tuan Duc Ngo; Evangelos Kalogerakis; Chuang Gan; Anh; Tran; Cuong Pham; Khoi Nguyen

arXiv:2312.10671·cs.CV·April 9, 2024·2 cites

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

Phuc D.A. Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis, Chuang Gan, Anh, Tran, Cuong Pham, Khoi Nguyen

PDF

Open Access 1 Repo

TL;DR

Open3DIS introduces a method that leverages 2D mask aggregation across frames to improve open-vocabulary 3D instance segmentation, especially for small and ambiguous objects, achieving state-of-the-art results.

Contribution

The paper presents a novel module that combines 2D mask aggregation with 3D proposals to enhance open-vocabulary 3D instance segmentation performance.

Findings

01

Significant performance improvements on ScanNet200, S3DIS, and Replica datasets.

02

Effective segmentation of small-scale and geometrically ambiguous objects.

03

Outperforms existing state-of-the-art methods in open-vocabulary 3D segmentation.

Abstract

We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task. Recent advancements in Open-Vocabulary scene understanding have made significant strides in this area by employing class-agnostic 3D instance proposal networks for object localization and learning queryable features for each 3D mask. While these methods produce high-quality instance proposals, they struggle with identifying small-scale and geometrically ambiguous objects. The key idea of our method is a new module that aggregates 2D instance masks across frames and maps them to geometrically coherent point cloud regions as high-quality object proposals addressing the above limitations. These are then combined with 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VinAIResearch/Open3DIS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Human Pose and Action Recognition