Med-Banana-50K: A Cross-modality Large-Scale Dataset for Text-guided Medical Image Editing
Zhihui Chen, Mengling Feng

TL;DR
Med-Banana-50K is a large-scale, high-quality dataset of over 50,000 medically curated image edits across various modalities, designed to advance research in medical image editing with strict clinical constraints and quality control.
Contribution
The paper introduces Med-Banana-50K, a comprehensive dataset with a novel quality control protocol and extensive evaluation logs, supporting reliable medical image editing research.
Findings
Over 50,000 curated medical image edits across multiple modalities.
Inclusion of 37,000 failed editing attempts with evaluation logs.
A new LLM-based quality control framework for medical image editing.
Abstract
Medical image editing has emerged as a pivotal technology with broad applications in data augmentation, model interpretability, medical education, and treatment simulation. However, the lack of large-scale, high-quality, and openly accessible datasets tailored for medical contexts with strict anatomical and clinical constraints has significantly hindered progress in this domain. To bridge this gap, we introduce Med-Banana-50K, a comprehensive dataset of over 50k medically curated image edits spanning chest X-ray, brain MRI, and fundus photography across 23 diseases. Each sample supports bidirectional lesion editing (addition and removal) and is constructed using Gemini-2.5-Flash-Image based on real clinical images. A key differentiator of our dataset is the medically grounded quality control protocol: we employ an LLM-as-Judge evaluation framework with criteria such as instruction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
