HEAT: Holistic Edge Attention Transformer for Structured Reconstruction
Jiacheng Chen, Yiming Qian, Yasutaka Furukawa

TL;DR
HEAT introduces a comprehensive attention-based neural network that reconstructs geometric structures from 2D images by detecting corners and classifying edges, outperforming existing methods in structured reconstruction tasks.
Contribution
The paper proposes a holistic edge classification architecture with innovative feature initialization, image feature fusion, and a masked training strategy for structured reconstruction.
Findings
Outperforms state-of-the-art in outdoor building and indoor floorplan reconstruction
Effective end-to-end corner detection and edge classification
Demonstrates robustness across different structured reconstruction tasks
Abstract
This paper presents a novel attention-based neural network for structured reconstruction, which takes a 2D raster image as an input and reconstructs a planar graph depicting an underlying geometric structure. The approach detects corners and classifies edge candidates between corners in an end-to-end manner. Our contribution is a holistic edge classification architecture, which 1) initializes the feature of an edge candidate by a trigonometric positional encoding of its end-points; 2) fuses image feature to each edge candidate by deformable attention; 3) employs two weight-sharing Transformer decoders to learn holistic structural patterns over the graph edge candidates; and 4) is trained with a masked learning strategy. The corner detector is a variant of the edge classification architecture, adapted to operate on pixels as corner candidates. We conduct experiments on two structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Label Smoothing · Dense Connections · Absolute Position Encodings · Softmax · Residual Connection
