Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang, Yichen Jia, Kapil, Krishnakumar, Tong Xiao, Feng Liang, Licheng Yu, Peter Vajda

TL;DR
Fairy is a fast, efficient video editing diffusion model that ensures high temporal coherence and quality, significantly outperforming previous methods in speed and fidelity.
Contribution
We introduce Fairy, a novel video synthesis model with anchor-based cross-frame attention and data augmentation, achieving real-time performance and improved temporal consistency.
Findings
Generates 120-frame videos in 14 seconds, 44x faster than prior methods.
Outperforms existing models in user quality assessments.
Ensures high temporal coherence and fidelity in video synthesis.
Abstract
In this paper, we introduce Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications. Our approach centers on the concept of anchor-based cross-frame attention, a mechanism that implicitly propagates diffusion features across frames, ensuring superior temporal coherence and high-fidelity synthesis. Fairy not only addresses limitations of previous models, including memory and processing speed. It also improves temporal consistency through a unique data augmentation strategy. This strategy renders the model equivariant to affine transformations in both source and target images. Remarkably efficient, Fairy generates 120-frame 512x384 videos (4-second duration at 30 FPS) in just 14 seconds, outpacing prior works by at least 44x. A comprehensive user study, involving 1000 generated samples, confirms that our approach delivers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies
MethodsDiffusion
