Transformer for Single Image Super-Resolution

Zhisheng Lu; Juncheng Li; Hong Liu; Chaoyan Huang; Linlin Zhang,; Tieyong Zeng

arXiv:2108.11084·cs.CV·April 25, 2022·5 cites

Transformer for Single Image Super-Resolution

Zhisheng Lu, Juncheng Li, Hong Liu, Chaoyan Huang, Linlin Zhang,, Tieyong Zeng

PDF

Open Access 1 Repo

TL;DR

This paper introduces ESRT, a hybrid Transformer-CNN model for single image super-resolution that achieves competitive results with significantly reduced computational costs and GPU memory usage.

Contribution

The paper proposes a novel Efficient Super-Resolution Transformer (ESRT) combining lightweight CNN and Transformer backbones with an efficient attention mechanism.

Findings

01

ESRT achieves competitive super-resolution results.

02

ESRT uses only 4,191M GPU memory compared to 16,057M of original Transformer.

03

Extensive experiments validate the efficiency and effectiveness of ESRT.

Abstract

Single image super-resolution (SISR) has witnessed great strides with the development of deep learning. However, most existing studies focus on building more complex networks with a massive number of layers. Recently, more and more researchers start to explore the application of Transformer in computer vision tasks. However, the heavy computational cost and high GPU memory occupation of the vision Transformer cannot be ignored. In this paper, we propose a novel Efficient Super-Resolution Transformer (ESRT) for SISR. ESRT is a hybrid model, which consists of a Lightweight CNN Backbone (LCB) and a Lightweight Transformer Backbone (LTB). Among them, LCB can dynamically adjust the size of the feature map to extract deep features with a low computational cost. LTB is composed of a series of Efficient Transformers (ET), which occupies a small GPU memory occupation, thanks to the specially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luissen/esrt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical Coherence Tomography Applications · Advanced Image Processing Techniques · Advanced Optical Sensing Technologies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Vision Transformer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Dense Connections · Residual Connection