EasyControl: Transfer ControlNet to Video Diffusion for Controllable   Generation and Interpolation

Cong Wang; Jiaxi Gu; Panwen Hu; Haoyu Zhao; Yuanfan Guo; Jianhua Han,; Hang Xu; Xiaodan Liang

arXiv:2408.13005·cs.CV·September 17, 2024

EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation

Cong Wang, Jiaxi Gu, Panwen Hu, Haoyu Zhao, Yuanfan Guo, Jianhua Han,, Hang Xu, Xiaodan Liang

PDF

Open Access

TL;DR

EasyControl introduces a universal framework that enables flexible, multi-condition control of video diffusion models, significantly enhancing generation quality and control precision over existing methods.

Contribution

The paper presents EasyControl, a novel method for integrating various control signals into pre-trained video diffusion models using condition adapters, improving controllability and performance.

Findings

01

Outperforms state-of-the-art methods on public datasets.

02

Achieves 152.0 improvement in FVD for sketch-to-video generation.

03

Demonstrates high fidelity and image retention in generated videos.

Abstract

Following the advancements in text-guided image generation technology exemplified by Stable Diffusion, video generation is gaining increased attention in the academic community. However, relying solely on text guidance for video generation has serious limitations, as videos contain much richer content than images, especially in terms of motion. This information can hardly be adequately described with plain text. Fortunately, in computer vision, various visual representations can serve as additional control signals to guide generation. With the help of these signals, video generation can be controlled in finer detail, allowing for greater flexibility for different applications. Integrating various controls, however, is nontrivial. In this paper, we propose a universal framework called EasyControl. By propagating and injecting condition features through condition adapters, our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Advanced Image Processing Techniques

MethodsSoftmax · Attention Is All You Need · Diffusion