OminiControl: Minimal and Universal Control for Diffusion Transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang

TL;DR
OminiControl introduces a minimal, versatile control method for Diffusion Transformers that achieves high performance with only 0.1% additional parameters, enabling flexible image conditioning across various tasks.
Contribution
It proposes a unified, parameter-efficient control approach leveraging existing DiT components and introduces a large-scale dataset for subject-driven image generation.
Findings
Matches or surpasses specialized control methods in multiple tasks
Uses only 0.1% extra parameters for control mechanisms
Introduces Subjects200K dataset for identity-consistent image pairs
Abstract
We present OminiControl, a novel approach that rethinks how image conditions are integrated into Diffusion Transformer (DiT) architectures. Current image conditioning methods either introduce substantial parameter overhead or handle only specific control tasks effectively, limiting their practical versatility. OminiControl addresses these limitations through three key innovations: (1) a minimal architectural design that leverages the DiT's own VAE encoder and transformer blocks, requiring just 0.1% additional parameters; (2) a unified sequence processing strategy that combines condition tokens with image tokens for flexible token interactions; and (3) a dynamic position encoding mechanism that adapts to both spatially-aligned and non-aligned control tasks. Our extensive experiments show that this streamlined approach not only matches but surpasses the performance of specialized methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Yuanshi/OminiControlmodel· 1.2k dl· ♡ 1461.2k dl♡ 146
- 🤗Yuanshi/OminiControlArtmodel· 54 dl· ♡ 154 dl♡ 1
- 🤗neta-art/Neta-Luminamodel· 3.7k dl· ♡ 3193.7k dl♡ 319
- 🤗Duplicate-repo/neta-lumina-beta-0624model
- 🤗Duplicate-repo/Neta-Luminamodel· 5 dl5 dl
- 🤗VirtualAddressExtension/Neta-Lumina-v1.0-diffusersmodel· 7 dl7 dl
- 🤗nvan15/OminiControlRotationmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnalog and Mixed-Signal Circuit Design
MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax
