MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation
Yibo Zhao, Yigong Zhang, and Jin Xie

TL;DR
MV3DIS introduces a zero-shot 3D instance segmentation framework that leverages multi-view 2D masks and 3D priors for consistent, accurate segmentation without extensive annotations.
Contribution
The paper proposes a novel 3D-guided mask matching strategy and depth consistency weighting to improve multi-view mask consistency and robustness in zero-shot 3D segmentation.
Findings
MV3DIS outperforms previous methods on multiple datasets.
The approach achieves higher 3D segmentation accuracy.
Incorporating 3D priors improves multi-view mask consistency.
Abstract
Conventional 3D instance segmentation methods rely on labor-intensive 3D annotations for supervised training, which limits their scalability and generalization to novel objects. Recent approaches leverage multi-view 2D masks from the Segment Anything Model (SAM) to guide the merging of 3D geometric primitives, thereby enabling zero-shot 3D instance segmentation. However, these methods typically process each frame independently and rely solely on 2D metrics, such as SAM prediction scores, to produce segmentation maps. This design overlooks multi-view correlations and inherent 3D priors, leading to inconsistent 2D masks across views and ultimately fragmented 3D segmentation. In this paper, we propose MV3DIS, a coarse-to-fine framework for zero-shot 3D instance segmentation that explicitly incorporates 3D priors. Specifically, we introduce a 3D-guided mask matching strategy that uses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
