TL;DR
TLB-VFI introduces an efficient, temporal-aware diffusion model for video frame interpolation that significantly improves quality, reduces parameters, and speeds up inference by leveraging novel temporal encoding techniques.
Contribution
The paper proposes TLB-VFI, a novel temporal-aware latent diffusion model that enhances video frame interpolation efficiency and performance with fewer parameters and training data.
Findings
Achieves 20% better FID on challenging datasets.
Uses 3x fewer parameters and 2.3x faster inference.
Requires 9000x less training data with optical flow guidance.
Abstract
Video Frame Interpolation (VFI) aims to predict the intermediate frame (we use n to denote time in videos to avoid notation overload with the timestep in diffusion models) based on two consecutive neighboring frames and . Recent approaches apply diffusion models (both image-based and video-based) in this task and achieve strong performance. However, image-based diffusion models are unable to extract temporal information and are relatively inefficient compared to non-diffusion methods. Video-based diffusion models can extract temporal information, but they are too large in terms of training scale, model size, and inference time. To mitigate the above issues, we propose Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation (TLB-VFI), an efficient video-based diffusion model. By extracting rich temporal information from video inputs through our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
