Fine-gained Zero-shot Video Sampling
Dengsheng Chen, Jie Hu, Xiaoming Wei, Enhua Wu

TL;DR
The paper introduces $\\mathcal{ZS}^2$, a zero-shot video sampling method that generates high-quality, fine-grained videos from image diffusion models without training, outperforming some supervised approaches.
Contribution
It presents a novel zero-shot video sampling algorithm that leverages dependency noise and temporal momentum attention to produce detailed videos from image models.
Findings
Achieves state-of-the-art zero-shot video generation performance.
Outperforms some recent supervised methods.
Enables high-quality, fine-grained video synthesis from images.
Abstract
Incorporating a temporal dimension into pretrained image diffusion models for video generation is a prevalent approach. However, this method is computationally demanding and necessitates large-scale video datasets. More critically, the heterogeneity between image and video datasets often results in catastrophic forgetting of the image expertise. Recent attempts to directly extract video snippets from image diffusion models have somewhat mitigated these problems. Nevertheless, these methods can only generate brief video clips with simple movements and fail to capture fine-grained motion or non-grid deformation. In this paper, we propose a novel Zero-Shot video Sampling algorithm, denoted as , capable of directly sampling high-quality video clips from existing image synthesis methods, such as Stable Diffusion, without any training or optimization. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotoacoustic and Ultrasonic Imaging
MethodsSoftmax · Attention Is All You Need · Diffusion
