MaterialFigBENCH: benchmark dataset with figures for evaluating college-level materials science problem-solving abilities of multimodal large language models
Michiko Yoshitake, Yuta Suzuki, Ryo Igarashi, Yoshitaka Ushiku, Keisuke Nagato

TL;DR
MaterialFigBench is a new benchmark dataset with figures for evaluating multimodal large language models' ability to solve university-level materials science problems requiring figure interpretation, revealing current models' limitations in visual reasoning and numerical accuracy.
Contribution
The paper introduces MaterialFigBench, a specialized benchmark dataset with figures for assessing multimodal LLMs in materials science problem-solving, highlighting existing challenges and guiding future improvements.
Findings
Current LLMs struggle with visual understanding of figures.
Models often rely on memorized knowledge rather than image reading.
Performance varies across problem types and model versions.
Abstract
We present MaterialFigBench, a benchmark dataset designed to evaluate the ability of multimodal large language models (LLMs) to solve university-level materials science problems that require accurate interpretation of figures. Unlike existing benchmarks that primarily rely on textual representations, MaterialFigBench focuses on problems in which figures such as phase diagrams, stress-strain curves, Arrhenius plots, diffraction patterns, and microstructural schematics are indispensable for deriving correct answers. The dataset consists of 137 free-response problems adapted from standard materials science textbooks, covering a broad range of topics including crystal structures, mechanical properties, diffusion, phase diagrams, phase transformations, and electronic properties of materials. To address unavoidable ambiguity in reading numerical values from images, expert-defined answer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Catalysis and Oxidation Reactions · Multimodal Machine Learning Applications
