MoonSeg3R: Monocular Online Zero-Shot Segment Anything in 3D with Reconstructive Foundation Priors

Zhipeng Du; Duolikun Danier; Jan Eric Lenssen; Hakan Bilen

arXiv:2512.15577·cs.CV·April 22, 2026

MoonSeg3R: Monocular Online Zero-Shot Segment Anything in 3D with Reconstructive Foundation Priors

Zhipeng Du, Duolikun Danier, Jan Eric Lenssen, Hakan Bilen

PDF

1 Repo

TL;DR

MoonSeg3R introduces a novel online zero-shot monocular 3D segmentation method leveraging a reconstructive foundation model, achieving state-of-the-art performance without requiring posed RGB-D sequences.

Contribution

The paper presents MoonSeg3R, a new approach that enables online monocular 3D segmentation using geometric priors from a reconstructive foundation model, with innovative modules for query refinement and temporal consistency.

Findings

01

First online monocular 3D segmentation method.

02

Achieves performance competitive with RGB-D systems.

03

Effective in real-world datasets ScanNet200 and SceneNN.

Abstract

In this paper, we focus on online zero-shot monocular 3D instance segmentation, a novel practical setting where existing approaches fail to perform because they rely on posed RGB-D sequences. To overcome this limitation, we leverage CUT3R, a recent Reconstructive Foundation Model (RFM), to provide reliable geometric priors from a single RGB stream. We propose MoonSeg3R, which introduces three key components: (1) a self-supervised query refinement module with spatial-semantic distillation that transforms segmentation masks from 2D visual foundation models (VFMs) into discriminative 3D queries; (2) a 3D query index memory that provides temporal consistency by retrieving contextual queries; and (3) a state-distribution token from CUT3R that acts as a mask identity descriptor to strengthen cross-frame fusion. Experiments on ScanNet200 and SceneNN show that MoonSeg3R is the first method to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VICO-UoE/MoonSeg3R
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.