LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network
Zhigang Jiang, Zhongzheng Xiang, Jinhua Xu, Ming Zhao

TL;DR
LGT-Net is a novel Transformer-based network that enhances indoor panoramic room layout estimation by integrating geometry-aware features and a specialized loss function, outperforming existing methods on benchmark datasets.
Contribution
The paper introduces LGT-Net with a SWG-Transformer architecture and a geometry-aware loss, improving 3D room layout estimation accuracy using omnidirectional geometry cues.
Findings
Outperforms state-of-the-art methods on benchmark datasets.
Effectively models local and global geometry relations.
Utilizes horizon-depth and room height for comprehensive geometry awareness.
Abstract
3D room layout estimation by a single panorama using deep neural networks has made great progress. However, previous approaches can not obtain efficient geometry awareness of room layout with the only latitude of boundaries or horizon-depth. We present that using horizon-depth along with room height can obtain omnidirectional-geometry awareness of room layout in both horizontal and vertical directions. In addition, we propose a planar-geometry aware loss function with normals and gradients of normals to supervise the planeness of walls and turning of corners. We propose an efficient network, LGT-Net, for room layout estimation, which contains a novel Transformer architecture called SWG-Transformer to model geometry relations. SWG-Transformer consists of (Shifted) Window Blocks and Global Blocks to combine the local and global geometry relations. Moreover, we design a novel relative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · 3D Surveying and Cultural Heritage
MethodsLinear Layer · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Attentive Walk-Aggregating Graph Neural Network · Max Pooling · Global Average Pooling · Bottleneck Residual Block · Residual Block
