IMENet: Joint 3D Semantic Scene Completion and 2D Semantic Segmentation through Iterative Mutual Enhancement
Jie Li, Laiyan Ding, Rui Huang

TL;DR
IMENet introduces an iterative mutual enhancement framework that jointly refines 3D scene completion and 2D segmentation, leveraging late-stage feature fusion to improve indoor scene understanding.
Contribution
The paper proposes a novel iterative late fusion approach with specialized modules for mutual refinement of 3D and 2D tasks, outperforming existing methods.
Findings
Outperforms state-of-the-art on NYU and NYUCAD datasets
Effective late-stage feature fusion improves both tasks
Mutual refinement enhances 3D and 2D semantic predictions
Abstract
3D semantic scene completion and 2D semantic segmentation are two tightly correlated tasks that are both essential for indoor scene understanding, because they predict the same semantic classes, using positively correlated high-level features. Current methods use 2D features extracted from early-fused RGB-D images for 2D segmentation to improve 3D scene completion. We argue that this sequential scheme does not ensure these two tasks fully benefit each other, and present an Iterative Mutual Enhancement Network (IMENet) to solve them jointly, which interactively refines the two tasks at the late prediction stage. Specifically, two refinement modules are developed under a unified framework for the two tasks. The first is a 2D Deformable Context Pyramid (DCP) module, which receives the projection from the current 3D predictions to refine the 2D predictions. In turn, a 3D Deformable Depth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Surveying and Cultural Heritage · Remote Sensing and LiDAR Applications
