Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

Honglin Lin; Chonghan Qin; Zheng Liu; Qizhi Pei; Yu Li; Zhanping Zhong; Xin Gao; Yanfeng Wang; Conghui He; Lijun Wu

arXiv:2601.17027·cs.CV·January 27, 2026

Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility

Honglin Lin, Chonghan Qin, Zheng Liu, Qizhi Pei, Yu Li, Zhanping Zhong, Xin Gao, Yanfeng Wang, Conghui He, Lijun Wu

PDF

Open Access 1 Datasets

TL;DR

This paper systematically evaluates scientific image synthesis methods, introduces a logic-driven framework and benchmark, and demonstrates that fine-tuning multimodal models on verified images enhances scientific reasoning capabilities.

Contribution

It presents ImgCoder, a new structured synthesis framework, and SciGenBench, a benchmark for scientific image correctness, advancing the fidelity and utility of synthetic scientific images.

Findings

01

Pixel-based models have systematic failure modes.

02

There is a fundamental trade-off between expressiveness and precision.

03

Fine-tuning LMMs on verified images improves reasoning performance.

Abstract

While synthetic data has proven effective for improving scientific reasoning in the text domain, multimodal reasoning remains constrained by the difficulty of synthesizing scientifically rigorous images. Existing Text-to-Image (T2I) models often produce outputs that are visually plausible yet scientifically incorrect, resulting in a persistent visual-logic divergence that limits their value for downstream reasoning. Motivated by recent advances in next-generation T2I models, we conduct a systematic study of scientific image synthesis across generation paradigms, evaluation, and downstream use. We analyze both direct pixel-based generation and programmatic synthesis, and propose ImgCoder, a logic-driven framework that follows an explicit "understand - plan - code" workflow to improve structural precision. To rigorously assess scientific correctness, we introduce SciGenBench, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

J017athan/SciGenBench
dataset· 353 dl
353 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Data Visualization and Analytics