SciDraw-6K: A Multilingual Scientific Illustration Dataset Generated by Google Gemini

Davie Chen

arXiv:2604.17206·cs.CV·April 21, 2026

SciDraw-6K: A Multilingual Scientific Illustration Dataset Generated by Google Gemini

Davie Chen

PDF

1 Repo

TL;DR

SciDraw-6K is a multilingual dataset of 6,291 scientific illustrations generated by Google Gemini models, designed to advance research in scientific visualization and domain-specific image synthesis.

Contribution

It introduces a purpose-built, multilingual scientific illustration dataset with detailed construction pipeline, supporting domain-adapted diffusion and prompt-engineering research.

Findings

01

Dataset enables multilingual scientific visualization research

02

Supports fine-tuning of diffusion models for scientific images

03

Provides a public resource for scientific drawing applications

Abstract

We present SciDraw-6K, a curated dataset of 6,291 scientific illustrations synthesized by Google Gemini image-generation models, each paired with prompts in eleven languages (English, Simplified Chinese, Traditional Chinese, Japanese, Korean, German, French, Spanish, Brazilian Portuguese, Italian, and Russian). Images span eight broad scientific categories -- biomedical, chemistry, materials, electronics, environment, AI systems, physics, and a long "other" tail -- and are produced primarily by the gemini-2.5-flash-image and gemini-3-pro-image-preview model families. In contrast to general-purpose text-to-image corpora that dominate the literature, SciDraw-6K is purpose-built for the scientific illustration genre: schematic diagrams, mechanism figures, table-of-contents graphics, and conceptual posters. We describe the construction pipeline, report dataset statistics, and document its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SciDrawAI/scidraw-6k
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.