Cross Aggregation Transformer for Image Restoration
Zheng Chen, Yulun Zhang, Jinjin Gu, Yongbing Zhang, Linghe Kong, Xin, Yuan

TL;DR
This paper introduces the Cross Aggregation Transformer (CAT), a novel image restoration model that combines rectangle-window self-attention with local-global feature coupling to improve long-range dependency modeling and performance.
Contribution
The paper proposes the Cross Aggregation Transformer with Rectangle-Window Self-Attention and Axial-Shift, integrating CNN inductive biases for enhanced image restoration.
Findings
Outperforms state-of-the-art methods on multiple image restoration tasks
Effectively models long-range dependencies with rectangle-window attention
Enhances local-global feature integration through the Locality Complementary Module
Abstract
Recently, Transformer architecture has been introduced into image restoration to replace convolution neural network (CNN) with surprising results. Considering the high computational complexity of Transformer with global attention, some methods use the local square window to limit the scope of self-attention. However, these methods lack direct interaction among different windows, which limits the establishment of long-range dependencies. To address the above issue, we propose a new image restoration model, Cross Aggregation Transformer (CAT). The core of our CAT is the Rectangle-Window Self-Attention (Rwin-SA), which utilizes horizontal and vertical rectangle window attention in different heads parallelly to expand the attention area and aggregate the features cross different windows. We also introduce the Axial-Shift operation for different window interactions. Furthermore, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Advanced Image Fusion Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Adam · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Convolution
