MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition
Xinyu Wei, Kangrui Cen, Hongyang Wei, Zhen Guo, Kai Cui, Bairui Li, Zeqing Wang, Jinrui Zhang, Lei Zhang

TL;DR
This paper introduces MICo-150K, a large-scale high-quality dataset for multi-image composition, along with benchmarks and a new evaluation metric, to advance research in controllable image generation from multiple references.
Contribution
The authors curated MICo-150K, a comprehensive dataset with diverse MICo prompts, and developed MICo-Bench and a new metric, enabling better evaluation and training of MICo models.
Findings
MICo-150K improves model capabilities in multi-image composition.
Fine-tuned models on MICo-150K outperform previous methods.
Qwen-MICo supports arbitrary multi-image inputs and matches existing models in 3-image composition.
Abstract
In controllable image generation, synthesizing coherent and consistent images from multiple reference inputs, i.e., Multi-Image Composition (MICo), remains a challenging problem, partly hindered by the lack of high-quality training data. To bridge this gap, we conduct a systematic study of MICo, categorizing it into 7 representative tasks and curate a large-scale collection of high-quality source images and construct diverse MICo prompts. Leveraging powerful proprietary models, we synthesize a rich amount of balanced composite images, followed by human-in-the-loop filtering and refinement, resulting in MICo-150K, a comprehensive dataset for MICo with identity consistency. We further build a Decomposition-and-Recomposition (De&Re) subset, where 11K real-world complex images are decomposed into components and recomposed, enabling both real and synthetic compositions. To enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
