Video Super-Resolution Transformer

Jiezhang Cao; Yawei Li; Kai Zhang; Luc Van Gool

arXiv:2106.06847·cs.CV·July 6, 2023·132 cites

Video Super-Resolution Transformer

Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel Transformer-based model for video super-resolution that incorporates spatial-temporal convolutional self-attention and optical flow-based feature alignment, significantly improving performance on benchmark datasets.

Contribution

It is the first to adapt Transformer architecture specifically for VSR by addressing data locality and feature alignment issues with new attention and feed-forward layers.

Findings

01

Outperforms existing VSR methods on benchmark datasets

02

Effectively exploits spatial-temporal locality in video data

03

Demonstrates the importance of feature alignment for VSR

Abstract

Video super-resolution (VSR), with the aim to restore a high-resolution video from its corresponding low-resolution version, is a spatial-temporal sequence prediction problem. Recently, Transformer has been gaining popularity due to its parallel computing ability for sequence-to-sequence modeling. Thus, it seems to be straightforward to apply the vision Transformer to solve VSR. However, the typical block design of Transformer with a fully connected self-attention layer and a token-wise feed-forward layer does not fit well for VSR due to the following two reasons. First, the fully connected self-attention layer neglects to exploit the data locality because this layer relies on linear layers to compute attention maps. Second, the token-wise feed-forward layer lacks the feature alignment which is important for VSR since this layer independently processes each of the input token embeddings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caojiezhang/VSR-Transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Advanced Vision and Imaging · Image and Signal Denoising Methods

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Vision Transformer · Label Smoothing · Residual Connection