OV-MAP : Open-Vocabulary Zero-Shot 3D Instance Segmentation Map for Robots

Juno Kim; Yesol Park; Hye-Jung Yoon; Byoung-Tak Zhang

arXiv:2506.11585·cs.CV·June 16, 2025

OV-MAP : Open-Vocabulary Zero-Shot 3D Instance Segmentation Map for Robots

Juno Kim, Yesol Park, Hye-Jung Yoon, Byoung-Tak Zhang

PDF

TL;DR

OV-MAP enables open-vocabulary zero-shot 3D instance segmentation for robots by integrating 2D mask projection, depth merging, and 3D mask voting, achieving robust performance without 3D supervision.

Contribution

This work introduces a novel open-world 3D mapping approach that combines 2D segmentation, depth fusion, and voting to improve zero-shot object recognition in robotic environments.

Findings

01

Outperforms existing methods on ScanNet200 and Replica datasets.

02

Demonstrates robustness and adaptability in real-world environments.

03

Achieves accurate zero-shot 3D instance segmentation without 3D supervised models.

Abstract

We introduce OV-MAP, a novel approach to open-world 3D mapping for mobile robots by integrating open-features into 3D maps to enhance object recognition capabilities. A significant challenge arises when overlapping features from adjacent voxels reduce instance-level precision, as features spill over voxel boundaries, blending neighboring regions together. Our method overcomes this by employing a class-agnostic segmentation model to project 2D masks into 3D space, combined with a supplemented depth image created by merging raw and synthetic depth from point clouds. This approach, along with a 3D mask voting mechanism, enables accurate zero-shot 3D instance segmentation without relying on 3D supervised segmentation models. We assess the effectiveness of our method through comprehensive experiments on public datasets such as ScanNet200 and Replica, demonstrating superior zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.