A Modular Framework for Single-View 3D Reconstruction of Indoor Environments

Yuxiao Li

arXiv:2512.17955·cs.CV·December 23, 2025

A Modular Framework for Single-View 3D Reconstruction of Indoor Environments

Yuxiao Li

PDF

Open Access

TL;DR

This paper introduces a modular diffusion-based framework for single-view indoor scene 3D reconstruction, improving accuracy and visual quality by separately predicting occluded parts, room layout, and scene alignment.

Contribution

It presents a novel modular approach combining amodal completion, inpainting, hybrid depth estimation, and view-space alignment for improved indoor scene reconstruction from a single image.

Findings

01

Outperforms state-of-the-art methods on 3D-Front dataset

02

Enhances reconstruction quality of occluded and background regions

03

Achieves more accurate placement of scene components

Abstract

We propose a modular framework for single-view indoor scene 3D reconstruction, where several core modules are powered by diffusion techniques. Traditional approaches for this task often struggle with the complex instance shapes and occlusions inherent in indoor environments. They frequently overshoot by attempting to predict 3D shapes directly from incomplete 2D images, which results in limited reconstruction quality. We aim to overcome this limitation by splitting the process into two steps: first, we employ diffusion-based techniques to predict the complete views of the room background and occluded indoor instances, then transform them into 3D. Our modular framework makes contributions to this field through the following components: an amodal completion module for restoring the full view of occluded instances, an inpainting model specifically trained to predict room layouts, a hybrid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization