Video Frame Interpolation with Transformer

Liying Lu; Ruizheng Wu; Huaijia Lin; Jiangbo Lu; Jiaya Jia

arXiv:2205.07230·cs.CV·May 17, 2022·1 cites

Video Frame Interpolation with Transformer

Liying Lu, Ruizheng Wu, Huaijia Lin, Jiangbo Lu, Jiaya Jia

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Transformer-based framework for video frame interpolation that effectively models long-range pixel correlations and utilizes a cross-scale attention mechanism to improve performance, achieving state-of-the-art results.

Contribution

The paper proposes a novel Transformer-based approach with cross-scale window attention for improved long-range correlation modeling in VFI.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Effectively models long-range pixel dependencies.

03

Utilizes cross-scale attention to enhance multi-scale information aggregation.

Abstract

Video frame interpolation (VFI), which aims to synthesize intermediate frames of a video, has made remarkable progress with development of deep convolutional networks over past years. Existing methods built upon convolutional networks generally face challenges of handling large motion due to the locality of convolution operations. To overcome this limitation, we introduce a novel framework, which takes advantage of Transformer to model long-range pixel correlation among video frames. Further, our network is equipped with a novel cross-scale window-based attention mechanism, where cross-scale windows interact with each other. This design effectively enlarges the receptive field and aggregates multi-scale information. Extensive quantitative and qualitative experiments demonstrate that our method achieves new state-of-the-art results on various benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dvlab-research/vfiformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Advanced Vision and Imaging · Image Processing Techniques and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer · Absolute Position Encodings