Recursive Generalization Transformer for Image Super-Resolution

Zheng Chen; Yulun Zhang; Jinjin Gu; Linghe Kong; Xiaokang Yang

arXiv:2303.06373·cs.CV·February 26, 2024·23 cites

Recursive Generalization Transformer for Image Super-Resolution

Zheng Chen, Yulun Zhang, Jinjin Gu, Linghe Kong, Xiaokang Yang

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

The paper introduces the Recursive Generalization Transformer (RGT) for image super-resolution, effectively capturing global context with recursive and cross-attention mechanisms, outperforming existing methods.

Contribution

It proposes a novel recursive-generalization self-attention mechanism combined with local attention and a hybrid adaptive integration for improved image super-resolution.

Findings

01

RGT achieves superior quantitative results on benchmark datasets.

02

The model effectively captures global spatial information.

03

Extensive experiments validate the effectiveness of the proposed approach.

Abstract

Transformer architectures have exhibited remarkable performance in image super-resolution (SR). Since the quadratic computational complexity of the self-attention (SA) in Transformer, existing methods tend to adopt SA in a local region to reduce overheads. However, the local design restricts the global context exploitation, which is crucial for accurate image reconstruction. In this work, we propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images. Specifically, we propose the recursive-generalization self-attention (RG-SA). It recursively aggregates input features into representative feature maps, and then utilizes cross-attention to extract global information. Meanwhile, the channel dimensions of attention matrices (query, key, and value) are further scaled to mitigate the redundancy in…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 5

Strengths

1. The idea of using global attention in the Transformer is widespread. But, the authors effectively maintain low computational complexity, which is meaningful in image SR. 2. Additionally, the proposed HAI is simple yet effective. Both the ablation experiments (Table 1 (c), (d)) and the visual results (Figs. 3, 4, 5) strongly support the authors' claim: integrate global and local modules. 3. The main comparisons with recent methods demonstrate the superiority of this method. I also notice that

Weaknesses

1. The experiments on RG-SA are not enough. The authors claim the superiority of RG-SA, but it is not compared with other global attention mechanisms. 2. The authors only provide visual comparisons on Urban100 and Manga109 datasets. Comparisons on other datasets are lacking.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

1. The authors propose the recursive-generalization self-attention (RG-SA), which controls computational complexity while achieving global modeling. 2. They also design the hybrid adaptive integration (HAI). It is a simple yet effective. 3. The paper's experiments are comprehensive. The ablation study demonstrates the effects of each component. 4. Quantitative and qualitative results indicate that the proposed method outperforms SwinIR and CAT-A. 5. The authors provide various visual results: fe

Weaknesses

1. Some details in the paper are not clear. For example, the representative map size "h" is set as 4 for training but 16 for testing. Why not use the same settings? 2. The RGT-S and RGT all adopt a larger window size than SwinIR. To establish a fairer comparison, it is recommended to use the same window size. 3. It would be beneficial to include comparisons with more recent methods, such as RGT, to evaluate the effectiveness of the proposed method. 4. A comparison of running times should be give

Reviewer 03Rating 8· accept, good paperConfidence 5

Strengths

- The paper's writing and organization are good. All illustrations, tables, and visual results are intuitive and clear. - The motivation for the proposed method is reasonable. The global information in SR is important while reducing the complexity of global attention is crucial for its application in SR tasks. - The proposed components RG-SA and HAI are novel and valuable. - The ablation study is extensive. The effectiveness of each part in RGT is demonstrated. - The authors provide multiple mo

Weaknesses

- When compared with CAT-A, the improvements of RGT on some datasets (Set5, Set14) are not very obvious (< 0.1 dB). - Although FLOPs are provided in Sec. 4.4, the running time of the model on real devices should also be provided. - The primary evaluation metrics used in the paper are PSNR and SSIM. However, these metrics may not reflect actual SR performance. Some perceptual metrics, such as LPIPS, should be evaluated.

Code & Models

Repositories

zhengchen1999/rgt
pytorchOfficial

Videos

Recursive Generalization Transformer for Image Super-Resolution· slideslive

Taxonomy

TopicsAdvanced Image Processing Techniques · Advanced Image Fusion Techniques · Image Processing Techniques and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Residual Connection · Dense Connections · Absolute Position Encodings · Linear Layer · Label Smoothing · Dropout · Adam · Softmax