OV-MAP : Open-Vocabulary Zero-Shot 3D Instance Segmentation Map for Robots
Juno Kim, Yesol Park, Hye-Jung Yoon, Byoung-Tak Zhang

TL;DR
OV-MAP enables open-vocabulary zero-shot 3D instance segmentation for robots by integrating 2D mask projection, depth merging, and 3D mask voting, achieving robust performance without 3D supervision.
Contribution
This work introduces a novel open-world 3D mapping approach that combines 2D segmentation, depth fusion, and voting to improve zero-shot object recognition in robotic environments.
Findings
Outperforms existing methods on ScanNet200 and Replica datasets.
Demonstrates robustness and adaptability in real-world environments.
Achieves accurate zero-shot 3D instance segmentation without 3D supervised models.
Abstract
We introduce OV-MAP, a novel approach to open-world 3D mapping for mobile robots by integrating open-features into 3D maps to enhance object recognition capabilities. A significant challenge arises when overlapping features from adjacent voxels reduce instance-level precision, as features spill over voxel boundaries, blending neighboring regions together. Our method overcomes this by employing a class-agnostic segmentation model to project 2D masks into 3D space, combined with a supplemented depth image created by merging raw and synthetic depth from point clouds. This approach, along with a 3D mask voting mechanism, enables accurate zero-shot 3D instance segmentation without relying on 3D supervised segmentation models. We assess the effectiveness of our method through comprehensive experiments on public datasets such as ScanNet200 and Replica, demonstrating superior zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
