FIFO-Diffusion: Generating Infinite Videos from Text without Training

Jihwan Kim; Junoh Kang; Jinyoung Choi; Bohyung Han

arXiv:2405.11473·cs.CV·November 5, 2024·2 cites

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Jihwan Kim, Junoh Kang, Jinyoung Choi, Bohyung Han

PDF

Open Access 1 Repo 1 Video

TL;DR

FIFO-Diffusion introduces a novel inference method for text-conditional video generation that produces infinitely long videos using a pretrained diffusion model, with techniques to reduce training-inference discrepancies and enable efficient parallel processing.

Contribution

The paper presents FIFO-Diffusion, a new inference technique allowing infinite video generation without additional training, utilizing diagonal denoising, latent partitioning, and lookahead strategies.

Findings

01

Produces infinitely long videos with constant memory usage.

02

Effective on existing text-to-video generation baselines.

03

Enables parallel inference on multiple GPUs.

Abstract

We propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Our approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without additional training. This is achieved by iteratively performing diagonal denoising, which simultaneously processes a series of consecutive frames with increasing noise levels in a queue; our method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising is a double-edged sword as the frames near the tail can take advantage of cleaner frames by forward reference but such a strategy induces the discrepancy between training and inference. Hence, we introduce latent partitioning to reduce the training-inference gap and lookahead denoising to leverage the benefit of forward referencing. Practically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jjihwan/FIFO-Diffusion_public
pytorchOfficial

Videos

FIFO-Diffusion: Generating Infinite Videos from Text without Training· slideslive

Taxonomy

TopicsVideo Analysis and Summarization

MethodsDiffusion