Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images
Yan Zhang, Xiyuan Gao, Qingyan Duan, Jiaxu Leng, Xiao Pu, Xinbo Gao

TL;DR
This paper introduces a Fourier-based complex self-attention mechanism for transformer models, enabling efficient and effective classification of very high-resolution remote sensing images by reducing computational complexity.
Contribution
It proposes a novel Fourier complex transformer with a complex self-attention mechanism that models high-order contextual information efficiently for VHR RS image classification.
Findings
FCT outperforms existing models on VHR RS datasets.
CSA reduces computational cost by over 50%.
The model stabilizes training with the Logmax normalization.
Abstract
Very high-resolution (VHR) remote sensing (RS) image classification is the fundamental task for RS image analysis and understanding. Recently, transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels) and achieved remarkable results on general image classification tasks. However, the complexity of the naive transformer grows quadratically with the increase in image size, which prevents transformer-based models from VHR RS image (500x500 pixels) classification and other computationally expensive downstream tasks. To this end, we propose to decompose the expensive self-attention (SA) into real and imaginary parts via discrete Fourier transform (DFT) and therefore propose an efficient complex self-attention (CSA) mechanism. Benefiting from the conjugated symmetric property of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Image and Signal Denoising Methods · Seismic Imaging and Inversion Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding
