Flowception: Temporally Expansive Flow Matching for Video Generation
Tariq Berrada Ifriqi, John Nguyen, Karteek Alahari, Jakob Verbeek, Ricky T. Q. Chen

TL;DR
Flowception introduces a non-autoregressive, variable-length video generation framework that efficiently models long-term video content by interleaving frame insertion and denoising, outperforming autoregressive methods in quality and computational efficiency.
Contribution
It proposes a novel non-autoregressive approach that reduces computational costs and enables joint learning of video length and content, improving over existing methods.
Findings
Achieves better FVD and VBench scores than baselines.
Reduces training FLOPs three-fold compared to full-sequence flows.
Supports seamless integration of image-to-video generation and interpolation.
Abstract
We present Flowception, a novel non-autoregressive and variable-length video generation framework. Flowception learns a probability path that interleaves discrete frame insertions with continuous frame denoising. Compared to autoregressive methods, Flowception alleviates error accumulation/drift as the frame insertion mechanism during sampling serves as an efficient compression mechanism to handle long-term context. Compared to full-sequence flows, our method reduces FLOPs for training three-fold, while also being more amenable to local attention variants, and allowing to learn the length of videos jointly with their content. Quantitative experimental results show improved FVD and VBench metrics over autoregressive and full-sequence baselines, which is further validated with qualitative results. Finally, by learning to insert and denoise frames in a sequence, Flowception seamlessly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies · Human Motion and Animation
