FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

Jangho Park; Geon Yeong Park; Gihyun Kwon; Jong Chul Ye

arXiv:2605.20910·cs.CV·May 21, 2026

FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

Jangho Park, Geon Yeong Park, Gihyun Kwon, Jong Chul Ye

PDF

1 Repo

TL;DR

FlowLong introduces a training-free, inference-time method for generating long videos by blending overlapping window predictions with Tweedie matching, ensuring temporal consistency and high visual quality.

Contribution

It proposes a novel, architecture-agnostic inference approach that generates longer videos without additional training, outperforming existing methods in quality and consistency.

Findings

01

Generates videos several times longer than native window length.

02

Outperforms training-free and autoregressive baselines in quality and temporal consistency.

03

Extends to audio-video joint generation and text-to-3DGS without fine-tuning.

Abstract

Extending the generation horizon of video diffusion models to long sequences remains a long-standing and important challenge. Existing training-free approaches fall into two categories: extensions of bidirectional models, which are tightly coupled to specific architectures and suffer from quality degradation over long horizons, and autoregressive models, which accumulate drift errors due to exposure bias and tend to produce repetitive motion patterns. To address these issues, we propose a novel but simple inference-time approach for long video generation that is architecture-agnostic and requires no additional training. Our method generates long videos via overlapping sliding windows, where predicted clean samples from adjacent windows are blended via \emph{Tweedie matching} to enforce both \textbf{manifold constraint and temporal consistency} across overlap regions. \emph{Stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jhq1234/flowlong
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.