Cross-Attention Transformer for Video Interpolation

Hannah Halin Kim; Shuzhi Yu; Shuai Yuan; Carlo Tomasi

arXiv:2207.04132·cs.CV·December 5, 2022

Cross-Attention Transformer for Video Interpolation

Hannah Halin Kim, Shuzhi Yu, Shuai Yuan, Carlo Tomasi

PDF

Open Access 1 Repo

TL;DR

This paper introduces TAIN, a novel transformer-based neural network for video frame interpolation that leverages cross similarity and image attention modules to improve accuracy without flow estimation.

Contribution

The paper presents a new transformer module, Cross Similarity, and an Image Attention mechanism for efficient, flow-free video interpolation.

Findings

01

Outperforms flow-free methods on benchmarks

02

Achieves comparable results to flow-based methods

03

Offers computational efficiency during inference

Abstract

We propose TAIN (Transformers and Attention for video INterpolation), a residual neural network for video interpolation, which aims to interpolate an intermediate frame given two consecutive image frames around it. We first present a novel vision transformer module, named Cross Similarity (CS), to globally aggregate input image features with similar appearance as those of the predicted interpolated frame. These CS features are then used to refine the interpolated prediction. To account for occlusions in the CS features, we propose an Image Attention (IA) module to allow the network to focus on CS features from one frame over those of the other. TAIN outperforms existing methods that do not require flow estimation and performs comparably to flow-based methods while being computationally efficient in terms of inference time on Vimeo90k, UCF101, and SNU-FILM benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hannahhalin/tain
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Advanced Vision and Imaging · Video Coding and Compression Technologies

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Dense Connections · Residual Connection · Layer Normalization · Vision Transformer