EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
Cong Wang, Jiaxi Gu, Panwen Hu, Haoyu Zhao, Yuanfan Guo, Jianhua Han,, Hang Xu, Xiaodan Liang

TL;DR
EasyControl introduces a universal framework that enables flexible, multi-condition control of video diffusion models, significantly enhancing generation quality and control precision over existing methods.
Contribution
The paper presents EasyControl, a novel method for integrating various control signals into pre-trained video diffusion models using condition adapters, improving controllability and performance.
Findings
Outperforms state-of-the-art methods on public datasets.
Achieves 152.0 improvement in FVD for sketch-to-video generation.
Demonstrates high fidelity and image retention in generated videos.
Abstract
Following the advancements in text-guided image generation technology exemplified by Stable Diffusion, video generation is gaining increased attention in the academic community. However, relying solely on text guidance for video generation has serious limitations, as videos contain much richer content than images, especially in terms of motion. This information can hardly be adequately described with plain text. Fortunately, in computer vision, various visual representations can serve as additional control signals to guide generation. With the help of these signals, video generation can be controlled in finer detail, allowing for greater flexibility for different applications. Integrating various controls, however, is nontrivial. In this paper, we propose a universal framework called EasyControl. By propagating and injecting condition features through condition adapters, our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Advanced Image Processing Techniques
MethodsSoftmax · Attention Is All You Need · Diffusion
