OmniControlNet: Dual-stage Integration for Conditional Image Generation
Yilin Wang, Haiyang Xu, Xiang Zhang, Zeyuan Chen, Zhizhou Sha, Zirui, Wang, Zhuowen Tu

TL;DR
OmniControlNet introduces a unified model that consolidates multiple conditioning methods into a single, efficient framework for conditional image generation, reducing redundancy while maintaining high-quality outputs.
Contribution
It integrates external condition generation and multiple conditioning types into one multi-task model guided by task and text embeddings, improving efficiency over traditional two-stage pipelines.
Findings
Reduces model complexity and redundancy.
Produces high-quality conditioned images comparable to existing methods.
Unifies multiple conditioning inputs into a single model.
Abstract
We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method and incorporating its individually trained image generation processes into a single model. Despite its tremendous success, the ControlNet of a two-stage pipeline bears limitations in being not self-contained (e.g. calls the external condition generation algorithms) with a large model redundancy (separately trained models for different types of conditioning inputs). Our proposed OmniControlNet consolidates 1) the condition generation (e.g., HED edges, depth maps, user scribble, and animal pose) by a single multi-tasking dense prediction algorithm under the task embedding guidance and 2) the image generation process for different conditioning types under the textual embedding guidance. OmniControlNet achieves significantly reduced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
