SinFusion: Training Diffusion Models on a Single Image or Video
Yaniv Nikankin, Niv Haim, Michal Irani

TL;DR
SinFusion introduces a diffusion model trained on a single image or video, enabling diverse manipulation and extrapolation tasks that are challenging for existing methods, by learning appearance and dynamics from minimal input.
Contribution
The paper presents SinFusion, a novel diffusion model trained on a single input, capable of performing diverse image and video manipulation tasks that previous models cannot handle.
Findings
Can generate diverse videos from a single input.
Able to extrapolate short videos into longer sequences.
Performs effective video upsampling.
Abstract
Diffusion models exhibited tremendous progress in image and video generation, exceeding GANs in quality and diversity. However, they are usually trained on very large datasets and are not naturally adapted to manipulate a given input image or video. In this paper we show how this can be resolved by training a diffusion model on a single input image or video. Our image/video-specific diffusion model (SinFusion) learns the appearance and dynamics of the single image or video, while utilizing the conditioning capabilities of diffusion models. It can solve a wide array of image/video-specific manipulation tasks. In particular, our model can learn from few frames the motion and dynamics of a single input video. It can then generate diverse new video samples of the same dynamic scene, extrapolate short videos into long ones (both forward and backward in time) and perform video upsampling.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Advanced Neuroimaging Techniques and Applications
MethodsDiffusion
