UniControl: A Unified Diffusion Model for Controllable Visual Generation   In the Wild

Can Qin; Shu Zhang; Ning Yu; Yihao Feng; Xinyi Yang; Yingbo Zhou; Huan; Wang; Juan Carlos Niebles; Caiming Xiong; Silvio Savarese; Stefano Ermon; Yun; Fu; Ran Xu

arXiv:2305.11147·cs.CV·November 3, 2023·24 cites

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan, Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun, Fu, Ran Xu

PDF

Open Access 1 Repo 1 Models 1 Datasets 1 Video

TL;DR

UniControl is a unified diffusion model that enables precise, controllable image generation with diverse visual conditions and language prompts, surpassing single-task methods in versatility and zero-shot performance.

Contribution

It introduces a task-aware HyperNet and augments pretrained diffusion models to handle multiple visual control tasks within a single framework.

Findings

01

Outperforms single-task controlled methods of similar size.

02

Demonstrates strong zero-shot generalization to unseen visual conditions.

03

Enables pixel-level precise image generation guided by visual and language inputs.

Abstract

Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often fall short in generating images with spatial, structural, or geometric controls. The integration of such controls, which can accommodate various visual conditions in a single unified model, remains an unaddressed challenge. In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. UniControl enables pixel-level-precise image generation, where visual conditions primarily influence the generated structures and language prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

salesforce/unicontrol
pytorchOfficial

Models

🤗
ModelsLab/unicontrol-v1.1
model· 9 dl
9 dl

Datasets

limingcv/MultiGen-20M_train
dataset· 232 dl
232 dl

Videos

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsDiffusion