Towards Chunk-Wise Generation for Long Videos

Siyang Zhang; Ser-Nam Lim

arXiv:2411.18668·cs.CV·December 2, 2024

Towards Chunk-Wise Generation for Long Videos

Siyang Zhang, Ser-Nam Lim

PDF

Open Access

TL;DR

This paper explores chunk-wise autoregressive methods for generating long videos, addressing memory constraints and inter-chunk consistency, and proposes a $k$-step search solution to improve long video synthesis.

Contribution

It provides a detailed survey of chunk-wise long video generation and introduces an efficient $k$-step search method to enhance inter-chunk coherence.

Findings

01

Chunk-wise autoregressive generation reduces memory load for long videos.

02

The $k$-step search improves consistency between video chunks.

03

Survey highlights challenges and solutions in long video synthesis.

Abstract

Generating long-duration videos has always been a significant challenge due to the inherent complexity of spatio-temporal domain and the substantial GPU memory demands required to calculate huge size tensors. While diffusion based generative models achieve state-of-the-art performance in video generation task, they are typically trained with predefined video resolutions and lengths. During inference, a noise tensor with specific resolution and length should be specified at first, and the model will perform denoising on the entire video tensor simultaneously, all the frames together. Such approach will easily raise an out-of-memory (OOM) problem when the specified resolution and/or length exceed a certain limit. One of the solutions to this problem is to generate many short video chunks autoregressively with strong inter-chunk spatio-temporal relation and then concatenate them together…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Video Coding and Compression Technologies · Advanced Image Processing Techniques

MethodsDiffusion