MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors

Zhenhua Du; Binbin Xu; Haoyu Zhang; Kai Huo; Shuaifeng Zhi

arXiv:2409.14019·cs.CV·September 24, 2024

MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors

Zhenhua Du, Binbin Xu, Haoyu Zhang, Kai Huo, Shuaifeng Zhi

PDF

TL;DR

MOSE introduces a neural field approach that lifts noisy monocular image priors to accurate 3D semantic reconstructions, improving geometry and semantics simultaneously from monocular images.

Contribution

The paper presents MOSE, a novel method that leverages class-agnostic masks and regularization to enhance 3D semantic and geometric reconstruction from monocular images.

Findings

01

Outperforms baselines in 3D semantic segmentation

02

Achieves better 2D semantic segmentation results

03

Improves 3D surface reconstruction quality

Abstract

Accurately reconstructing dense and semantically annotated 3D meshes from monocular images remains a challenging task due to the lack of geometry guidance and imperfect view-dependent 2D priors. Though we have witnessed recent advancements in implicit neural scene representations enabling precise 2D rendering simply from multi-view images, there have been few works addressing 3D scene understanding with monocular priors alone. In this paper, we propose MOSE, a neural field semantic reconstruction approach to lift inferred image-level noisy priors to 3D, producing accurate semantics and geometry in both 3D and 2D space. The key motivation for our method is to leverage generic class-agnostic segment masks as guidance to promote local consistency of rendered semantics during training. With the help of semantics, we further apply a smoothness regularization to texture-less regions for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.