CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion
Xingrui Wang, Xin Li, Zhibo Chen

TL;DR
CoNo introduces a novel noise injection method with a look-back mechanism and long-term regularization to improve scene consistency in long video diffusion without retraining.
Contribution
It proposes the CoNo method, enhancing long video generation by modeling fine-grained scene transitions and maintaining content consistency without additional training.
Findings
Improves scene consistency in long video generation.
Effective under single- and multi-text prompts.
Reduces abrupt scene transitions.
Abstract
Tuning-free long video diffusion has been proposed to generate extended-duration videos with enriched content by reusing the knowledge from pre-trained short video diffusion model without retraining. However, most works overlook the fine-grained long-term video consistency modeling, resulting in limited scene consistency (i.e., unreasonable object or background transitions), especially with multiple text inputs. To mitigate this, we propose the Consistency Noise Injection, dubbed CoNo, which introduces the "look-back" mechanism to enhance the fine-grained scene transition between different video clips, and designs the long-term consistency regularization to eliminate the content shifts when extending video contents through noise prediction. In particular, the "look-back" mechanism breaks the noise scheduling process into three essential parts, where one internal noise prediction part is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntegrated Circuits and Semiconductor Failure Analysis · Electrostatic Discharge in Electronics · 3D IC and TSV technologies
MethodsContrastive Language-Image Pre-training · Diffusion
