InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation

Jinqi Xiao; Qing Yan; Liming Jiang; Zichuan Liu; Hao Kang; Shen Sang; Tiancheng Zhi; Jing Liu; Cheng Yang; Xin Lu; Bo Yuan

arXiv:2512.21788·cs.CV·May 5, 2026

InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation

Jinqi Xiao, Qing Yan, Liming Jiang, Zichuan Liu, Hao Kang, Shen Sang, Tiancheng Zhi, Jing Liu, Cheng Yang, Xin Lu, Bo Yuan

PDF

TL;DR

InstructMoLE introduces an instruction-guided routing mechanism for mixture of experts in diffusion models, improving multi-conditional image generation fidelity and coherence.

Contribution

It proposes a global instruction-guided routing strategy and an orthogonality loss to enhance expert diversity and global semantic consistency in image generation.

Findings

01

Outperforms existing LoRA adapters and MoLE variants on multi-conditional benchmarks.

02

Reduces artifacts like spatial fragmentation and semantic drift.

03

Enhances compositional control and fidelity to user instructions.

Abstract

Parameter-Efficient Fine-Tuning of Diffusion Transformers (DiTs) for diverse, multi-conditional tasks often suffers from task interference when using monolithic adapters like LoRA. The Mixture of Low-rank Experts (MoLE) architecture offers a modular solution, but its potential is usually limited by routing policies that operate at a token level. Such local routing can conflict with the global nature of user instructions, leading to artifacts like spatial fragmentation and semantic drift in complex image generation tasks. To address these limitations, we introduce InstructMoLE, a novel framework that employs an Instruction-Guided Mixture of Low-Rank Experts. Instead of per-token routing, InstructMoLE utilizes a global routing signal, Instruction-Guided Routing (IGR), derived from the user's comprehensive instruction. This ensures that a single, coherently chosen expert council is applied…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.