Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion
Jingyuan Chen, Fuchen Long, Jie An, Zhaofan Qiu, Ting Yao, Jiebo Luo,, Tao Mei

TL;DR
Ouroboros-Diffusion is a novel video denoising framework that enhances structural and subject consistency for long video generation, addressing limitations of FIFO-Diffusion in maintaining long-range temporal coherence.
Contribution
It introduces a new latent sampling technique, Subject-Aware Cross-Frame Attention, and self-recurrent guidance to improve long video consistency and subject coherence.
Findings
Outperforms existing methods on VBench in subject consistency
Achieves smoother motion and better temporal coherence
Enhances structural and content consistency in long videos
Abstract
The first-in-first-out (FIFO) video diffusion, built on a pre-trained text-to-video model, has recently emerged as an effective approach for tuning-free long video generation. This technique maintains a queue of video frames with progressively increasing noise, continuously producing clean frames at the queue's head while Gaussian noise is enqueued at the tail. However, FIFO-Diffusion often struggles to keep long-range temporal consistency in the generated videos due to the lack of correspondence modeling across frames. In this paper, we propose Ouroboros-Diffusion, a novel video denoising framework designed to enhance structural and content (subject) consistency, enabling the generation of consistent videos of arbitrary length. Specifically, we introduce a new latent sampling technique at the queue tail to improve structural consistency, ensuring perceptually smooth transitions among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology
MethodsSoftmax · Attention Is All You Need
