VFIMamba: Video Frame Interpolation with State Space Models

Guozhen Zhang; Chunxu Liu; Yutao Cui; Xiaotong Zhao; Kai; Ma; Limin Wang

arXiv:2407.02315·cs.CV·October 11, 2024

VFIMamba: Video Frame Interpolation with State Space Models

Guozhen Zhang, Chunxu Liu, Yutao Cui, Xiaotong Zhao, Kai, Ma, Limin Wang

PDF

Open Access 1 Repo 3 Models

TL;DR

VFIMamba introduces a novel video frame interpolation method leveraging state space models for efficient, dynamic, and high-quality intermediate frame generation, especially excelling in high-resolution videos.

Contribution

It proposes the Mixed-SSM Block and a curriculum learning strategy to enhance inter-frame modeling using S6 models, achieving state-of-the-art results.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Improves PSNR by 0.80 dB on 4K frames.

03

Efficiently models long-range temporal dependencies.

Abstract

Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, Selective State Space Models (S6) have emerged, tailored specifically for long sequence modeling, offering both linear complexity and data-dependent modeling capabilities. In this paper, we propose VFIMamba, a novel frame interpolation method for efficient and dynamic inter-frame modeling by harnessing the S6 model. Our approach introduces the Mixed-SSM Block (MSB), which initially rearranges tokens from adjacent frames in an interleaved fashion and subsequently applies multi-directional S6 modeling. This design facilitates the efficient transmission of information across frames while upholding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcg-nju/vfimamba
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Coding and Compression Technologies · Advanced Vision and Imaging · Advanced Image Processing Techniques

MethodsConvolution