Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao, Zhao Song, Chiwun Yang

TL;DR
This paper introduces Video Latent Flow Matching (VLFM), a novel method that uses polynomial projections in latent space for efficient video interpolation and extrapolation, leveraging pre-trained image models and the HiPPO framework.
Contribution
The paper proposes a new approach combining polynomial projections and the HiPPO framework for improved video modeling and frame rate flexibility.
Findings
Effective video interpolation and extrapolation demonstrated on multiple datasets.
Theoretical guarantees of bounded approximation error and robustness.
Compatibility with pre-trained image generation models.
Abstract
This paper considers an efficient video modeling process called Video Latent Flow Matching (VLFM). Unlike prior works, which randomly sampled latent patches for video generation, our method relies on current strong pre-trained image generation models, modeling a certain caption-guided flow of latent patches that can be decoded to time-dependent video frames. We first speculate multiple images of a video are differentiable with respect to time in some latent space. Based on this conjecture, we introduce the HiPPO framework to approximate the optimal projection for polynomials to generate the probability path. Our approach gains the theoretical benefits of the bounded universal approximation error and timescale robustness. Moreover, VLFM processes the interpolation and extrapolation abilities for video generation with arbitrary frame rates. We conduct experiments on several text-to-video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
