HTC-DC Net: Monocular Height Estimation from Single Remote Sensing Images
Sining Chen, Yilei Shi, Zhitong Xiong, Xiao Xiang Zhu

TL;DR
This paper introduces HTC-DC Net, a novel monocular height estimation method from remote sensing images that improves accuracy by addressing long-tailed distributions and using a classification-regression paradigm with transformers.
Contribution
The paper proposes HTC-DC Net with HTC-AdaBins and distribution-based constraints, introducing a new classification-regression approach for monocular height estimation from remote sensing data.
Findings
Outperforms existing methods on three datasets
Effectively handles long-tailed height distributions
Demonstrates the effectiveness of each component through ablation studies
Abstract
3D geo-information is of great significance for understanding the living environment; however, 3D perception from remote sensing data, especially on a large scale, is restricted. To tackle this problem, we propose a method for monocular height estimation from optical imagery, which is currently one of the richest sources of remote sensing data. As an ill-posed problem, monocular height estimation requires well-designed networks for enhanced representations to improve performance. Moreover, the distribution of height values is long-tailed with the low-height pixels, e.g., the background, as the head, and thus trained networks are usually biased and tend to underestimate building heights. To solve the problems, instead of formalizing the problem as a regression task, we propose HTC-DC Net following the classification-regression paradigm, with the head-tail cut (HTC) and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Remote Sensing and LiDAR Applications · Video Surveillance and Tracking Methods
MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention · Feature Pyramid Network · Residual Connection · 1x1 Convolution · Region Proposal Network · RoIAlign · Convolution
