TL;DR
MoonSeg3R introduces a novel online zero-shot monocular 3D segmentation method leveraging a reconstructive foundation model, achieving state-of-the-art performance without requiring posed RGB-D sequences.
Contribution
The paper presents MoonSeg3R, a new approach that enables online monocular 3D segmentation using geometric priors from a reconstructive foundation model, with innovative modules for query refinement and temporal consistency.
Findings
First online monocular 3D segmentation method.
Achieves performance competitive with RGB-D systems.
Effective in real-world datasets ScanNet200 and SceneNN.
Abstract
In this paper, we focus on online zero-shot monocular 3D instance segmentation, a novel practical setting where existing approaches fail to perform because they rely on posed RGB-D sequences. To overcome this limitation, we leverage CUT3R, a recent Reconstructive Foundation Model (RFM), to provide reliable geometric priors from a single RGB stream. We propose MoonSeg3R, which introduces three key components: (1) a self-supervised query refinement module with spatial-semantic distillation that transforms segmentation masks from 2D visual foundation models (VFMs) into discriminative 3D queries; (2) a 3D query index memory that provides temporal consistency by retrieving contextual queries; and (3) a state-distribution token from CUT3R that acts as a mask identity descriptor to strengthen cross-frame fusion. Experiments on ScanNet200 and SceneNN show that MoonSeg3R is the first method to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
