ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework
Guanzhou Chen, Erfei Cui, Changyao Tian, Danni Yang, Ganlin Yang, Yu Qiao, Hongsheng Li, Gen Luo, Hongjie Zhang

TL;DR
This paper introduces ScaleEditor, an open-source multi-agent framework for large-scale, high-quality image editing data generation, resulting in the creation of the ScaleEdit-12M dataset and improved model performance.
Contribution
It presents a novel hierarchical multi-agent pipeline for scalable, high-quality image editing dataset construction, overcoming limitations of previous methods.
Findings
Created the largest open-source image editing dataset, ScaleEdit-12M.
Fine-tuning models on ScaleEdit improves performance significantly.
Open-source framework approaches commercial-grade data quality.
Abstract
Instruction-based image editing has emerged as a key capability for unified multimodal models (UMMs), yet constructing large-scale, diverse, and high-quality editing datasets without costly proprietary APIs remains challenging. Previous image editing datasets either rely on closed-source models for annotation, which prevents cost-effective scaling, or employ fixed synthetic editing pipelines, which suffer from limited quality and generalizability. To address these challenges, we propose ScaleEditor, a fully open-source hierarchical multi-agent framework for end-to-end construction of large-scale, high-quality image editing datasets. Our pipeline consists of three key components: source image expansion with world-knowledge infusion, adaptive multi-agent editing instruction-image synthesis, and a task-aware data quality verification mechanism. Using ScaleEditor, we curate ScaleEdit-12M,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Cell Image Analysis Techniques
