VRT: A Video Restoration Transformer

Jingyun Liang; Jiezhang Cao; Yuchen Fan; Kai Zhang; Rakesh; Ranjan; Yawei Li; Radu Timofte; Luc Van Gool

arXiv:2201.12288·cs.CV·June 16, 2022·82 cites

VRT: A Video Restoration Transformer

Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh, Ranjan, Yawei Li, Radu Timofte, Luc Van Gool

PDF

Open Access 1 Repo

TL;DR

This paper introduces VRT, a novel Video Restoration Transformer that effectively models long-range temporal dependencies and aligns frames using a parallel warping approach, significantly improving performance across multiple video restoration tasks.

Contribution

VRT is the first transformer-based model to incorporate parallel frame prediction and long-range temporal modeling for comprehensive video restoration.

Findings

01

Outperforms state-of-the-art methods by up to 2.16dB in PSNR.

02

Effective across five video restoration tasks on fourteen datasets.

03

Demonstrates superior long-range temporal dependency modeling.

Abstract

Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which either is restricted by frame-by-frame restoration or lacks long-range modelling ability. In this paper, we propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities. More specifically, VRT is composed of multiple scales, each of which consists of two kinds of modules: temporal mutual self attention (TMSA) and parallel warping. TMSA divides the video into small clips, on which mutual attention is applied for joint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jingyunliang/vrt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Image Processing Techniques and Applications · Advanced Vision and Imaging

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Residual Connection · Dense Connections · Absolute Position Encodings · Byte Pair Encoding · Dropout · Position-Wise Feed-Forward Layer