UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer
Delong Liu, Zhaohui Hou, Mingjie Zhan, Shihao Han, Zhicheng Zhao, Fei, Su

TL;DR
UFO is a versatile plug-in that improves the consistency and quality of diffusion-based video generation models by using adaptive adapters, without altering original models and supporting transferability and stylized training.
Contribution
We introduce UFO, a modular, efficient, and transferable plug-in that enhances diffusion-based video generation quality and consistency without retraining original models.
Findings
UFO significantly improves video consistency and quality.
UFO demonstrates superior performance on public benchmarks.
UFO supports stylized and transfer learning across models.
Abstract
Recently, diffusion-based video generation models have achieved significant success. However, existing models often suffer from issues like weak consistency and declining image quality over time. To overcome these challenges, inspired by aesthetic principles, we propose a non-invasive plug-in called Uniform Frame Organizer (UFO), which is compatible with any diffusion-based video generation model. The UFO comprises a series of adaptive adapters with adjustable intensities, which can significantly enhance the consistency between the foreground and background of videos and improve image quality without altering the original model parameters when integrated. The training for UFO is simple, efficient, requires minimal resources, and supports stylized training. Its modular design allows for the combination of multiple UFOs, enabling the customization of personalized video generation models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
