A Real-Time Online Learning Framework for Joint 3D Reconstruction and Semantic Segmentation of Indoor Scenes
Davide Menini, Suryansh Kumar, Martin R. Oswald, Erik Sandstrom,, Cristian Sminchisescu, Luc Van Gool

TL;DR
This paper introduces a real-time online framework that jointly reconstructs 3D indoor scenes and performs semantic segmentation using deep learning, improving accuracy and efficiency in noisy, real-world conditions.
Contribution
It proposes a novel deep neural network with vortex pooling for online depth and semantic fusion, eliminating routing networks to enhance detail preservation and noise resistance.
Findings
Achieves 37 and 10 fps in depth fusion with high accuracy
Attains 88% and 91% reconstruction F-score on the Replica dataset
Secures an average IoU of 0.515 on ScanNet benchmark
Abstract
This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label. Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed deep neural network based approach learns to fuse the depth over frames with suitable semantic labels in the scene space. Our approach exploits the joint volumetric representation of the depth and semantics in the scene feature space to solve this task. For a compelling online fusion of the semantic labels and geometry in real-time, we introduce an efficient vortex pooling block while dropping the use of routing network in online depth fusion to preserve high-frequency surface details. We show that the context information provided by the semantics of the scene helps the depth fusion network learn noise-resistant features. Not only that, it helps overcome the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Remote Sensing and LiDAR Applications
