FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion

Yu Lu; Yi Yang

arXiv:2507.00162·cs.CV·July 2, 2025

FreeLong++: Training-Free Long Video Generation via Multi-band SpectralFusion

Yu Lu, Yi Yang

PDF

TL;DR

FreeLong++ is a training-free, multi-scale spectral fusion framework that significantly improves the quality and temporal consistency of long video generation from existing short-video models without additional training.

Contribution

It introduces a novel multi-branch, multi-scale frequency fusion architecture that enhances long video generation quality without extra training.

Findings

01

Outperforms previous methods on longer video generation tasks.

02

Enables coherent multi-prompt video generation with smooth transitions.

03

Supports controllable video generation using depth or pose sequences.

Abstract

Recent advances in video generation models have enabled high-quality short video generation from text prompts. However, extending these models to longer videos remains a significant challenge, primarily due to degraded temporal consistency and visual fidelity. Our preliminary observations show that naively applying short-video generation models to longer sequences leads to noticeable quality degradation. Further analysis identifies a systematic trend where high-frequency components become increasingly distorted as video length grows, an issue we term high-frequency distortion. To address this, we propose FreeLong, a training-free framework designed to balance the frequency distribution of long video features during the denoising process. FreeLong achieves this by blending global low-frequency features, which capture holistic semantics across the full video, with local high-frequency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.