Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction

Jiahao Tian; Chenxi Song; Wei Cheng; Chi Zhang

arXiv:2603.25209·cs.CV·March 27, 2026

Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction

Jiahao Tian, Chenxi Song, Wei Cheng, Chi Zhang

PDF

Open Access

TL;DR

This paper introduces FreeLOC, a training-free framework that enhances long video generation from short-trained diffusion models by addressing out-of-distribution issues through layer-adaptive re-encoding and attention strategies, improving quality and consistency.

Contribution

The paper proposes a novel, training-free, layer-adaptive method with hierarchical re-encoding and attention techniques to effectively generate long videos from pre-trained short-video models.

Findings

01

Outperforms existing training-free methods in quality and consistency.

02

Achieves state-of-the-art results in long video generation.

03

Effectively addresses frame-level and context-length out-of-distribution problems.

Abstract

Generating long videos using pre-trained video diffusion models, which are typically trained on short clips, presents a significant challenge. Directly applying these models for long-video inference often leads to a notable degradation in visual quality. This paper identifies that this issue primarily stems from two out-of-distribution (O.O.D) problems: frame-level relative position O.O.D and context-length O.O.D. To address these challenges, we propose FreeLOC, a novel training-free, layer-adaptive framework that introduces two core techniques: Video-based Relative Position Re-encoding (VRPR) for frame-level relative position O.O.D, a multi-granularity strategy that hierarchically re-encodes temporal relative positions to align with the model's pre-trained distribution, and Tiered Sparse Attention (TSA) for context-length O.O.D, which preserves both local detail and long-range…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Coding and Compression Technologies · Image and Video Quality Assessment · Advanced Vision and Imaging