Transformer for Single Image Super-Resolution
Zhisheng Lu, Juncheng Li, Hong Liu, Chaoyan Huang, Linlin Zhang,, Tieyong Zeng

TL;DR
This paper introduces ESRT, a hybrid Transformer-CNN model for single image super-resolution that achieves competitive results with significantly reduced computational costs and GPU memory usage.
Contribution
The paper proposes a novel Efficient Super-Resolution Transformer (ESRT) combining lightweight CNN and Transformer backbones with an efficient attention mechanism.
Findings
ESRT achieves competitive super-resolution results.
ESRT uses only 4,191M GPU memory compared to 16,057M of original Transformer.
Extensive experiments validate the efficiency and effectiveness of ESRT.
Abstract
Single image super-resolution (SISR) has witnessed great strides with the development of deep learning. However, most existing studies focus on building more complex networks with a massive number of layers. Recently, more and more researchers start to explore the application of Transformer in computer vision tasks. However, the heavy computational cost and high GPU memory occupation of the vision Transformer cannot be ignored. In this paper, we propose a novel Efficient Super-Resolution Transformer (ESRT) for SISR. ESRT is a hybrid model, which consists of a Lightweight CNN Backbone (LCB) and a Lightweight Transformer Backbone (LTB). Among them, LCB can dynamically adjust the size of the feature map to extract deep features with a low computational cost. LTB is composed of a series of Efficient Transformers (ET), which occupies a small GPU memory occupation, thanks to the specially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical Coherence Tomography Applications · Advanced Image Processing Techniques · Advanced Optical Sensing Technologies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Vision Transformer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Dense Connections · Residual Connection
