Monocular Semantic Scene Completion via Masked Recurrent Networks

Xuzhi Wang; Xinran Wu; Song Wang; Lingdong Kong; Ziping Zhao

arXiv:2507.17661·cs.CV·July 24, 2025

Monocular Semantic Scene Completion via Masked Recurrent Networks

Xuzhi Wang, Xinran Wu, Song Wang, Lingdong Kong, Ziping Zhao

PDF

Open Access

TL;DR

This paper introduces MonoMRN, a two-stage monocular semantic scene completion framework using masked recurrent networks, which improves accuracy and robustness in complex indoor and outdoor scenes.

Contribution

It proposes a novel two-stage framework with Masked Recurrent Networks, including MS-GRU and distance attention projection, to enhance scene completion accuracy and efficiency.

Findings

01

Achieves state-of-the-art performance on NYUv2 and SemanticKITTI datasets.

02

Demonstrates robustness to various disturbances in scene completion.

03

Supports both indoor and outdoor scene understanding.

Abstract

Monocular Semantic Scene Completion (MSSC) aims to predict the voxel-wise occupancy and semantic category from a single-view RGB image. Existing methods adopt a single-stage framework that aims to simultaneously achieve visible region segmentation and occluded region hallucination, while also being affected by inaccurate depth estimation. Such methods often achieve suboptimal performance, especially in complex scenes. We propose a novel two-stage framework that decomposes MSSC into coarse MSSC followed by the Masked Recurrent Network. Specifically, we propose the Masked Sparse Gated Recurrent Unit (MS-GRU) which concentrates on the occupied regions by the proposed mask updating mechanism, and a sparse GRU design is proposed to reduce the computation cost. Additionally, we propose the distance attention projection to reduce projection errors by assigning different attention scores…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Topic Modeling