CCM: Adding Conditional Controls to Text-to-Image Consistency Models

Jie Xiao; Kai Zhu; Han Zhang; Zhiheng Liu; Yujun Shen; Yu Liu; Xueyang; Fu; Zheng-Jun Zha

arXiv:2312.06971·cs.CV·December 13, 2023·1 cites

CCM: Adding Conditional Controls to Text-to-Image Consistency Models

Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang, Fu, Zheng-Jun Zha

PDF

Open Access

TL;DR

This paper explores methods to incorporate conditional controls into Consistency Models for text-to-image generation, proposing strategies for high-level semantic and low-level detail control, and demonstrating their effectiveness across various conditions.

Contribution

It introduces three novel approaches for adding conditional controls to CMs, including direct application of ControlNet, training from scratch, and lightweight adapters for multi-condition transfer.

Findings

01

ControlNet trained on diffusion models can be applied to CMs for semantic control.

02

ControlNet can be trained from scratch using Consistency Training.

03

Lightweight adapters enable multi-condition control transfer.

Abstract

Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality. However, the way to add new conditional controls to the pretrained CMs has not been explored. In this technical report, we consider alternative strategies for adding ControlNet-like conditional control to CMs and present three significant findings. 1) ControlNet trained for diffusion models (DMs) can be directly applied to CMs for high-level semantic controls but struggles with low-level detail and realism control. 2) CMs serve as an independent class of generative models, based on which ControlNet can be trained from scratch using Consistency Training proposed by Song et al. 3) A lightweight adapter can be jointly optimized under multiple conditions through Consistency Training, allowing for the swift transfer of DMs-based ControlNet to CMs. We study these three solutions across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques

MethodsDiffusion · Adapter