Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation

Guoqing Zhang; Xingtong Ge; Lu Shi; Xin Zhang; Muqing Xue; Wanru Xu; Yigang Cen; Yidong Li

arXiv:2508.17364·cs.CV·December 16, 2025

Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation

Guoqing Zhang, Xingtong Ge, Lu Shi, Xin Zhang, Muqing Xue, Wanru Xu, Yigang Cen, Yidong Li

PDF

TL;DR

This paper introduces UniGen, a unified framework for controllable image generation that reduces redundancy and enhances efficiency by integrating diverse conditions through novel modules like CoMoE and WeaveNet.

Contribution

The paper proposes CoMoE and WeaveNet modules to unify conditional image generation, reducing model redundancy and improving interaction between control signals.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Effectively mitigates feature entanglement and redundancy.

03

Enhances versatility and efficiency in controllable image generation.

Abstract

The image-to-image generation task aims to produce controllable images by leveraging conditional inputs and prompt instructions. However, existing methods often train separate control branches for each type of condition, leading to redundant model structures and inefficient use of computational resources. To address this, we propose a Unified image-to-image Generation (UniGen) framework that supports diverse conditional inputs while enhancing generation efficiency and expressiveness. Specifically, to tackle the widely existing parameter redundancy and computational inefficiency in controllable conditional generation architectures, we propose the Condition Modulated Expert (CoMoE) module. This module aggregates semantically similar patch features and assigns them to dedicated expert modules for visual representation and conditional modeling. By enabling independent modeling of foreground…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.