Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, Zhe Gan

TL;DR
Pico-Banana-400K is a large, high-quality dataset of 400,000 real images with diverse, instruction-based edits, designed to advance research in text-guided image editing by enabling complex, multi-turn, and preference-based tasks.
Contribution
The paper introduces Pico-Banana-400K, the first large-scale, systematically curated dataset for instruction-based image editing with real images, supporting complex editing scenarios and diverse task types.
Findings
Enables training of advanced text-guided image editing models
Supports research on multi-turn and preference-based editing
Provides high-quality, diverse, real-image edit pairs
Abstract
Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, with systems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images. We introduce Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing. Our dataset is constructed by leveraging Nano-Banana to generate diverse edit pairs from real photographs in the OpenImages collection. What distinguishes Pico-Banana-400K from previous synthetic datasets is our systematic approach to quality and diversity. We employ a fine-grained image editing taxonomy to ensure comprehensive coverage of edit types while maintaining precise content preservation and instruction faithfulness through MLLM-based quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Humanities and Scholarship · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
