Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation
Lingyu Liu, Yaxiong Wang, Li Zhu, Zhedong Zheng

TL;DR
This paper introduces a bidirectional cycle-consistent framework for video frame interpolation using video diffusion models, improving temporal consistency and motion accuracy without extra inference cost.
Contribution
It proposes a novel reversible interpolation method with learnable directional tokens and cycle consistency, enhancing long-range sequence stability and accuracy.
Findings
Achieves state-of-the-art results on 37-frame and 73-frame tasks.
Outperforms baselines in image quality, motion smoothness, and dynamic control.
Maintains inference efficiency despite bidirectional training.
Abstract
Video frame interpolation aims to synthesize realistic intermediate frames between given endpoints while adhering to specific motion semantics. While recent generative models have improved visual fidelity, they predominantly operate in a unidirectional manner, lacking mechanisms to self-verify temporal consistency. This often leads to motion drift, directional ambiguity, and boundary misalignment, especially in long-range sequences. Inspired by the principle of temporal cycle-consistency in self-supervised learning, we propose a novel bidirectional framework that enforces symmetry between forward and backward generation trajectories. Our approach introduces learnable directional tokens to explicitly condition a shared backbone on temporal orientation, enabling the model to jointly optimize forward synthesis and backward reconstruction within a single unified architecture. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
