Contextual Learning in Fourier Complex Field for VHR Remote Sensing   Images

Yan Zhang; Xiyuan Gao; Qingyan Duan; Jiaxu Leng; Xiao Pu; Xinbo Gao

arXiv:2210.15972·cs.CV·October 31, 2022

Contextual Learning in Fourier Complex Field for VHR Remote Sensing Images

Yan Zhang, Xiyuan Gao, Qingyan Duan, Jiaxu Leng, Xiao Pu, Xinbo Gao

PDF

Open Access 3 Repos

TL;DR

This paper introduces a Fourier-based complex self-attention mechanism for transformer models, enabling efficient and effective classification of very high-resolution remote sensing images by reducing computational complexity.

Contribution

It proposes a novel Fourier complex transformer with a complex self-attention mechanism that models high-order contextual information efficiently for VHR RS image classification.

Findings

01

FCT outperforms existing models on VHR RS datasets.

02

CSA reduces computational cost by over 50%.

03

The model stabilizes training with the Logmax normalization.

Abstract

Very high-resolution (VHR) remote sensing (RS) image classification is the fundamental task for RS image analysis and understanding. Recently, transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels) and achieved remarkable results on general image classification tasks. However, the complexity of the naive transformer grows quadratically with the increase in image size, which prevents transformer-based models from VHR RS image (500x500 pixels) classification and other computationally expensive downstream tasks. To this end, we propose to decompose the expensive self-attention (SA) into real and imaginary parts via discrete Fourier transform (DFT) and therefore propose an efficient complex self-attention (CSA) mechanism. Benefiting from the conjugated symmetric property of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification · Image and Signal Denoising Methods · Seismic Imaging and Inversion Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding