MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang

TL;DR
This paper introduces the Multi-Instance Generation Controller (MIGC), a novel method for generating multiple diverse instances within a single image with precise control over location and attributes, advancing text-to-image synthesis.
Contribution
The paper proposes MIGC, an innovative approach that decomposes multi-instance generation into subtasks with an attention mechanism, and introduces the COCO-MIG benchmark for evaluation.
Findings
MIGC achieves high control accuracy in quantity, position, and attributes.
Extensive experiments demonstrate MIGC's superior performance on COCO-MIG and other benchmarks.
The approach enhances the versatility of text-to-image synthesis models.
Abstract
We present a Multi-Instance Generation (MIG) task, simultaneously generating multiple instances with diverse controls in one image. Given a set of predefined coordinates and their corresponding descriptions, the task is to ensure that generated instances are accurately at the designated locations and that all instances' attributes adhere to their corresponding description. This broadens the scope of current research on Single-instance generation, elevating it to a more versatile and practical dimension. Inspired by the idea of divide and conquer, we introduce an innovative approach named Multi-Instance Generation Controller (MIGC) to address the challenges of the MIG task. Initially, we break down the MIG task into several subtasks, each involving the shading of a single instance. To ensure precise shading for each instance, we introduce an instance enhancement attention mechanism.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Computer Graphics and Visualization Techniques · Human Motion and Animation
MethodsSparse Evolutionary Training · Diffusion
