Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao and, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong

TL;DR
Uni-ControlNet is a versatile framework that enables simultaneous use of multiple local and global controls in text-to-image diffusion models, improving controllability and efficiency with minimal fine-tuning.
Contribution
It introduces a unified, composable control framework requiring only two adapters, reducing fine-tuning costs and enhancing control over image generation.
Findings
Outperforms existing methods in controllability and quality.
Requires only two adapters regardless of control types.
Demonstrates superior composability and efficiency.
Abstract
Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challenges in understanding such complex texts and generating the corresponding images. Therefore, there is a growing need to enable more control modes beyond text description. In this paper, we introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls (e.g., edge maps, depth map, segmentation masks) and global controls (e.g., CLIP image embeddings) in a flexible and composable manner within one single model. Unlike existing methods,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMycobacterium research and diagnosis · Advanced Neuroimaging Techniques and Applications · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion · Contrastive Language-Image Pre-training · Adapter
