Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction
Jiahao Tian, Chenxi Song, Wei Cheng, Chi Zhang

TL;DR
This paper introduces FreeLOC, a training-free framework that enhances long video generation from short-trained diffusion models by addressing out-of-distribution issues through layer-adaptive re-encoding and attention strategies, improving quality and consistency.
Contribution
The paper proposes a novel, training-free, layer-adaptive method with hierarchical re-encoding and attention techniques to effectively generate long videos from pre-trained short-video models.
Findings
Outperforms existing training-free methods in quality and consistency.
Achieves state-of-the-art results in long video generation.
Effectively addresses frame-level and context-length out-of-distribution problems.
Abstract
Generating long videos using pre-trained video diffusion models, which are typically trained on short clips, presents a significant challenge. Directly applying these models for long-video inference often leads to a notable degradation in visual quality. This paper identifies that this issue primarily stems from two out-of-distribution (O.O.D) problems: frame-level relative position O.O.D and context-length O.O.D. To address these challenges, we propose FreeLOC, a novel training-free, layer-adaptive framework that introduces two core techniques: Video-based Relative Position Re-encoding (VRPR) for frame-level relative position O.O.D, a multi-granularity strategy that hierarchically re-encodes temporal relative positions to align with the model's pre-trained distribution, and Tiered Sparse Attention (TSA) for context-length O.O.D, which preserves both local detail and long-range…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Coding and Compression Technologies · Image and Video Quality Assessment · Advanced Vision and Imaging
