Training Transformer Models by Wavelet Losses Improves Quantitative and   Visual Performance in Single Image Super-Resolution

Cansu Korkmaz; A. Murat Tekalp

arXiv:2404.11273·eess.IV·April 18, 2024·1 cites

Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution

Cansu Korkmaz, A. Murat Tekalp

PDF

Open Access 1 Repo

TL;DR

This paper introduces wavelet losses and convolutional non-local sparse attention blocks to enhance Transformer-based image super-resolution, achieving state-of-the-art quantitative and visual results.

Contribution

It presents a novel combination of wavelet losses with an extended hybrid Transformer architecture for improved super-resolution performance.

Findings

01

Achieves state-of-the-art PSNR results.

02

Provides superior visual quality in super-resolution tasks.

03

Demonstrates effectiveness of wavelet losses in training Transformer models.

Abstract

Transformer-based models have achieved remarkable results in low-level vision tasks including image super-resolution (SR). However, early Transformer-based approaches that rely on self-attention within non-overlapping windows encounter challenges in acquiring global information. To activate more input pixels globally, hybrid attention models have been proposed. Moreover, training by solely minimizing pixel-wise RGB losses, such as L1, have been found inadequate for capturing essential high-frequency details. This paper presents two contributions: i) We introduce convolutional non-local sparse attention (NLSA) blocks to extend the hybrid transformer architecture in order to further enhance its receptive field. ii) We employ wavelet losses to train Transformer models to improve quantitative and subjective performance. While wavelet losses have been explored previously, showing their power…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mandalinadagi/wavelettention
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques · Photoacoustic and Ultrasonic Imaging · Advanced Image Processing Techniques

MethodsAttention Is All You Need · Dropout · Adam · Position-Wise Feed-Forward Layer · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Multi-Head Attention · Dense Connections