Video Frame Interpolation with Flow Transformer
Pan Gao, Haoyue Tian, Jie Qin

TL;DR
This paper introduces a Flow Transformer-based framework for video frame interpolation that effectively captures motion dynamics and long-range pixel dependencies, resulting in higher quality interpolated frames especially with large motions.
Contribution
It develops a novel Flow Transformer Block incorporating optical flow-guided temporal self-attention, addressing limitations of convolutional methods in capturing details and long-range dependencies.
Findings
Outperforms state-of-the-art methods on three benchmarks
Effectively handles large motion in video frames
Maintains low complexity with local attention mechanism
Abstract
Video frame interpolation has been actively studied with the development of convolutional neural networks. However, due to the intrinsic limitations of kernel weight sharing in convolution, the interpolated frame generated by it may lose details. In contrast, the attention mechanism in Transformer can better distinguish the contribution of each pixel, and it can also capture long-range pixel dependencies, which provides great potential for video interpolation. Nevertheless, the original Transformer is commonly used for 2D images; how to develop a Transformer-based framework with consideration of temporal self-attention for video frame interpolation remains an open issue. In this paper, we propose Video Frame Interpolation Flow Transformer to incorporate motion dynamics from optical flows into the self-attention mechanism. Specifically, we design a Flow Transformer Block that calculates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Processing Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Position-Wise Feed-Forward Layer · Layer Normalization · Linear Layer · Dense Connections · Label Smoothing · Dropout · Adam
