DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic Layouts
Yidi Li, Yiqun Wang, Zhengda Lu, and Jun Xiao

TL;DR
DepthGAN is a novel transformer-based model that generates accurate and structurally coherent depth maps from semantic layouts, improving 3D indoor scene reconstruction.
Contribution
Introduces DepthGAN with a cascade transformer generator and cross-attention fusion for enhanced depth map generation from semantic layouts.
Findings
Achieves superior quantitative performance
Produces visually realistic depth maps
Enables coherent 3D indoor scene reconstruction
Abstract
Limited by the computational efficiency and accuracy, generating complex 3D scenes remains a challenging problem for existing generation networks. In this work, we propose DepthGAN, a novel method of generating depth maps with only semantic layouts as input. First, we introduce a well-designed cascade of transformer blocks as our generator to capture the structural correlations in depth maps, which makes a balance between global feature aggregation and local attention. Meanwhile, we propose a cross-attention fusion module to guide edge preservation efficiently in depth generation, which exploits additional appearance supervision information. Finally, we conduct extensive experiments on the perspective views of the Structured3d panorama dataset and demonstrate that our DepthGAN achieves superior performance both on quantitative results and visual effects in the depth generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Remote Sensing and LiDAR Applications · 3D Surveying and Cultural Heritage
