MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding
Zekun Li, Xianjun Yang, Kyuri Choi, Wanrong Zhu, Ryan Hsieh, HyeonJung, Kim, Jin Hyuk Lim, Sungyoung Ji, Byungju Lee, Xifeng Yan, Linda Ruth Petzold,, Stephen D. Wilson, Woosang Lim, William Yang Wang

TL;DR
This paper introduces a comprehensive, multi-disciplinary dataset for scientific figure interpretation, enabling advanced AI models to understand complex scientific visuals and outperform human experts in certain tasks.
Contribution
The paper presents a large, diverse dataset of complex scientific figures from 72 fields, and demonstrates its effectiveness in training models that surpass existing benchmarks and human performance.
Findings
Models fine-tuned on the dataset outperform GPT-4o and humans in multiple-choice tasks.
Pre-training on article-figure data improves performance in materials science.
The dataset covers complex visualizations requiring graduate-level expertise.
Abstract
Scientific figure interpretation is a crucial capability for AI-driven scientific assistants built on advanced Large Vision Language Models. However, current datasets and benchmarks primarily focus on simple charts or other relatively straightforward figures from limited science domains. To address this gap, we present a comprehensive dataset compiled from peer-reviewed Nature Communications articles covering 72 scientific fields, encompassing complex visualizations such as schematic diagrams, microscopic images, and experimental data which require graduate-level expertise to interpret. We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation. Our analysis revealed significant task challenges and performance gaps among models. Beyond serving as a benchmark, this dataset serves as a valuable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsFocus
