Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Yusu Qian; Eli Bocek-Rivele; Liangchen Song; Jialing Tong; Yinfei Yang; Jiasen Lu; Wenze Hu; Zhe Gan

arXiv:2510.19808·cs.CV·October 23, 2025

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, Zhe Gan

PDF

Open Access

TL;DR

Pico-Banana-400K is a large, high-quality dataset of 400,000 real images with diverse, instruction-based edits, designed to advance research in text-guided image editing by enabling complex, multi-turn, and preference-based tasks.

Contribution

The paper introduces Pico-Banana-400K, the first large-scale, systematically curated dataset for instruction-based image editing with real images, supporting complex editing scenarios and diverse task types.

Findings

01

Enables training of advanced text-guided image editing models

02

Supports research on multi-turn and preference-based editing

03

Provides high-quality, diverse, real-image edit pairs

Abstract

Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, with systems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images. We introduce Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing. Our dataset is constructed by leveraging Nano-Banana to generate diverse edit pairs from real photographs in the OpenImages collection. What distinguishes Pico-Banana-400K from previous synthetic datasets is our systematic approach to quality and diversity. We employ a fine-grained image editing taxonomy to ensure comprehensive coverage of edit types while maintaining precise content preservation and instruction faithfulness through MLLM-based quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Humanities and Scholarship · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications