VFIMamba: Video Frame Interpolation with State Space Models
Guozhen Zhang, Chunxu Liu, Yutao Cui, Xiaotong Zhao, Kai, Ma, Limin Wang

TL;DR
VFIMamba introduces a novel video frame interpolation method leveraging state space models for efficient, dynamic, and high-quality intermediate frame generation, especially excelling in high-resolution videos.
Contribution
It proposes the Mixed-SSM Block and a curriculum learning strategy to enhance inter-frame modeling using S6 models, achieving state-of-the-art results.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Improves PSNR by 0.80 dB on 4K frames.
Efficiently models long-range temporal dependencies.
Abstract
Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, Selective State Space Models (S6) have emerged, tailored specifically for long sequence modeling, offering both linear complexity and data-dependent modeling capabilities. In this paper, we propose VFIMamba, a novel frame interpolation method for efficient and dynamic inter-frame modeling by harnessing the S6 model. Our approach introduces the Mixed-SSM Block (MSB), which initially rearranges tokens from adjacent frames in an interleaved fashion and subsequently applies multi-directional S6 modeling. This design facilitates the efficient transmission of information across frames while upholding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Coding and Compression Technologies · Advanced Vision and Imaging · Advanced Image Processing Techniques
MethodsConvolution
