Efficient Convolution and Transformer-Based Network for Video Frame Interpolation
Issa Khalifeh, Luka Murn, Marta Mrak, Ebroul Izquierdo

TL;DR
This paper presents a novel video frame interpolation network combining convolutional and transformer encoders, significantly reducing memory usage and inference time while maintaining competitive accuracy on complex motion benchmarks.
Contribution
The proposed dual-encoder network effectively integrates convolutional and transformer features, reducing memory and computation costs compared to existing transformer-based methods.
Findings
Reduces memory usage by nearly 50%.
Runs up to four times faster during inference.
Achieves competitive performance on complex motion benchmarks.
Abstract
Video frame interpolation is an increasingly important research task with several key industrial applications in the video coding, broadcast and production sectors. Recently, transformers have been introduced to the field resulting in substantial performance gains. However, this comes at a cost of greatly increased memory usage, training and inference time. In this paper, a novel method integrating a transformer encoder and convolutional features is proposed. This network reduces the memory burden by close to 50% and runs up to four times faster during inference time compared to existing transformer-based interpolation methods. A dual-encoder architecture is introduced which combines the strength of convolutions in modelling local correlations with those of the transformer for long-range dependencies. Quantitative evaluations are conducted on various benchmarks with complex motion to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Processing Techniques and Applications
