EF-VI: Enhancing End-Frame Injection for Video Inbetweening
Liuhan Chen, Xiaodong Cun, Xiaoyu Li, Xianyi He, Shenghai Yuan, Jie Chen, Ying Shan, Li Yuan

TL;DR
EF-VI introduces a novel end-frame injection method for video inbetweening that enhances temporal constraints without disrupting input representations, leveraging a lightweight EF-Net module for improved performance.
Contribution
The paper proposes EF-VI, a new framework with EF-Net for better end-frame constraint enforcement in transformer-based I2V-DMs, avoiding input disruption.
Findings
Outperforms existing methods in video inbetweening quality
Efficiently encodes end frames with a lightweight module
Demonstrates superior results across extensive experiments
Abstract
Video inbetweening aims to synthesize intermediate video sequences conditioned on the given start and end frames. Current state-of-the-art methods primarily extend large-scale pre-trained Image-to-Video Diffusion Models (I2V-DMs) by incorporating the end-frame condition via direct fine-tuning or temporally bidirectional sampling. However, the former results in a weak end-frame constraint, while the latter inevitably disrupts the input representation of video frames, leading to suboptimal performance. To improve the end-frame constraint while avoiding disruption of the input representation, we propose a novel video inbetweening framework specific to recent and more powerful transformer-based I2V-DMs, termed EF-VI. It efficiently strengthens the end-frame constraint by utilizing an enhanced injection. This is based on our proposed well-designed lightweight module, termed EF-Net, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
