MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers
Sijia Li, Chen Chen, Haonan Lu

TL;DR
This paper introduces MoEController, a mixture-of-expert diffusion model that enables zero-shot, open-domain, text-guided global and local image manipulation, significantly advancing the flexibility of image editing with natural language instructions.
Contribution
The paper presents a novel MOE-based diffusion model trained on large-scale datasets for versatile, zero-shot image editing guided by natural language instructions.
Findings
High-quality global and local image editing results.
Effective handling of diverse open-domain images and instructions.
State-of-the-art performance in zero-shot image manipulation tasks.
Abstract
Diffusion-model-based text-guided image generation has recently made astounding progress, producing fascinating results in open-domain image manipulation tasks. Few models, however, currently have complete zero-shot capabilities for both global and local image editing due to the complexity and diversity of image manipulation tasks. In this work, we propose a method with a mixture-of-expert (MOE) controllers to align the text-guided capacity of diffusion models with different kinds of human instructions, enabling our model to handle various open-domain image manipulation tasks with natural language instructions. First, we use large language models (ChatGPT) and conditional image synthesis models (ControlNet) to generate a large number of global image transfer dataset in addition to the instruction-based local image editing dataset. Then, using an MOE technique and task-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
MethodsALIGN · Diffusion
