Semantic Frame Interpolation
Yijia Hong, Jiangning Zhang, Ran Yi, Yuji Wang, Weijian Cao, Xiaobin Hu, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lizhuang Ma

TL;DR
This paper introduces Semantic Frame Interpolation (SFI), a new task for generating intermediate video frames with text prompts, along with a novel model and a dedicated benchmark dataset to evaluate performance across multiple frame rates.
Contribution
The paper defines the SFI task, proposes the SemFi model with a Mixture-of-LoRA module, and introduces SFI-300K, the first dataset and benchmark for this task.
Findings
SemFi model achieves high consistency with control conditions.
SFI-300K enables comprehensive evaluation of frame interpolation methods.
The approach supports inference at multiple frame rates.
Abstract
Generating intermediate video content of varying lengths based on given first and last frames, along with text prompt information, offers significant research and application potential. However, traditional frame interpolation tasks primarily focus on scenarios with a small number of frames, no text control, and minimal differences between the first and last frames. Recent community developers have utilized large video models represented by Wan to endow frame-to-frame capabilities. However, these models can only generate a fixed number of frames and often fail to produce satisfactory results for certain frame lengths, while this setting lacks a clear official definition and a well-established benchmark. In this paper, we first propose a new practical Semantic Frame Interpolation (SFI) task from the perspective of academic definition, which covers the above two settings and supports…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
