SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

Yuwei Guo; Ceyuan Yang; Anyi Rao; Maneesh Agrawala; Dahua Lin; Bo Dai

arXiv:2311.16933·cs.CV·November 29, 2023·1 cites

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai

PDF

Open Access 1 Repo 3 Models

TL;DR

SparseCtrl introduces a method for controlling text-to-video generation using sparse structural signals, enhancing flexibility and practicality without retraining the underlying models.

Contribution

It enables structure control with minimal inputs by adding a condition encoder to pre-trained models, compatible with various modalities.

Findings

01

Effective control with sparse signals demonstrated

02

Compatible with multiple modalities like sketches and depth maps

03

Generalizes across different text-to-video models

Abstract

The development of text-to-video (T2V), i.e., generating videos with a given text prompt, has been significantly advanced in recent years. However, relying solely on text prompts often results in ambiguous frame composition due to spatial uncertainty. The research community thus leverages the dense structure signals, e.g., per-frame depth/edge sequences, to enhance controllability, whose collection accordingly increases the burden of inference. In this work, we present SparseCtrl to enable flexible structure control with temporally sparse signals, requiring only one or a few inputs, as shown in Figure 1. It incorporates an additional condition encoder to process these sparse signals while leaving the pre-trained T2V model untouched. The proposed approach is compatible with various modalities, including sketches, depth maps, and RGB images, providing more practical control for video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guoyww/animatediff
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Human Motion and Animation