MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition

Xinyu Wei; Kangrui Cen; Hongyang Wei; Zhen Guo; Kai Cui; Bairui Li; Zeqing Wang; Jinrui Zhang; Lei Zhang

arXiv:2512.07348·cs.CV·April 29, 2026

MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition

Xinyu Wei, Kangrui Cen, Hongyang Wei, Zhen Guo, Kai Cui, Bairui Li, Zeqing Wang, Jinrui Zhang, Lei Zhang

PDF

5 Models 2 Datasets

TL;DR

This paper introduces MICo-150K, a large-scale high-quality dataset for multi-image composition, along with benchmarks and a new evaluation metric, to advance research in controllable image generation from multiple references.

Contribution

The authors curated MICo-150K, a comprehensive dataset with diverse MICo prompts, and developed MICo-Bench and a new metric, enabling better evaluation and training of MICo models.

Findings

01

MICo-150K improves model capabilities in multi-image composition.

02

Fine-tuned models on MICo-150K outperform previous methods.

03

Qwen-MICo supports arbitrary multi-image inputs and matches existing models in 3-image composition.

Abstract

In controllable image generation, synthesizing coherent and consistent images from multiple reference inputs, i.e., Multi-Image Composition (MICo), remains a challenging problem, partly hindered by the lack of high-quality training data. To bridge this gap, we conduct a systematic study of MICo, categorizing it into 7 representative tasks and curate a large-scale collection of high-quality source images and construct diverse MICo prompts. Leveraging powerful proprietary models, we synthesize a rich amount of balanced composite images, followed by human-in-the-loop filtering and refinement, resulting in MICo-150K, a comprehensive dataset for MICo with identity consistency. We further build a Decomposition-and-Recomposition (De&Re) subset, where 11K real-world complex images are decomposed into components and recomposed, enabling both real and synthetic compositions. To enable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.