UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan, Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun, Fu, Ran Xu

TL;DR
UniControl is a unified diffusion model that enables precise, controllable image generation with diverse visual conditions and language prompts, surpassing single-task methods in versatility and zero-shot performance.
Contribution
It introduces a task-aware HyperNet and augments pretrained diffusion models to handle multiple visual control tasks within a single framework.
Findings
Outperforms single-task controlled methods of similar size.
Demonstrates strong zero-shot generalization to unseen visual conditions.
Enables pixel-level precise image generation guided by visual and language inputs.
Abstract
Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often fall short in generating images with spatial, structural, or geometric controls. The integration of such controls, which can accommodate various visual conditions in a single unified model, remains an unaddressed challenge. In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. UniControl enables pixel-level-precise image generation, where visual conditions primarily influence the generated structures and language prompts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
