A Modular Framework for Single-View 3D Reconstruction of Indoor Environments
Yuxiao Li

TL;DR
This paper introduces a modular diffusion-based framework for single-view indoor scene 3D reconstruction, improving accuracy and visual quality by separately predicting occluded parts, room layout, and scene alignment.
Contribution
It presents a novel modular approach combining amodal completion, inpainting, hybrid depth estimation, and view-space alignment for improved indoor scene reconstruction from a single image.
Findings
Outperforms state-of-the-art methods on 3D-Front dataset
Enhances reconstruction quality of occluded and background regions
Achieves more accurate placement of scene components
Abstract
We propose a modular framework for single-view indoor scene 3D reconstruction, where several core modules are powered by diffusion techniques. Traditional approaches for this task often struggle with the complex instance shapes and occlusions inherent in indoor environments. They frequently overshoot by attempting to predict 3D shapes directly from incomplete 2D images, which results in limited reconstruction quality. We aim to overcome this limitation by splitting the process into two steps: first, we employ diffusion-based techniques to predict the complete views of the room background and occluded indoor instances, then transform them into 3D. Our modular framework makes contributions to this field through the following components: an amodal completion module for restoring the full view of occluded instances, an inpainting model specifically trained to predict room layouts, a hybrid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization
