Seeing Beyond Words: MatVQA for Challenging Visual-Scientific Reasoning in Materials Science

Sifan Wu; Huan Zhang; Yizhan Li; Farshid Effaty; Amirreza Ataei; Bang Liu

arXiv:2505.18319·cs.CE·May 27, 2025

Seeing Beyond Words: MatVQA for Challenging Visual-Scientific Reasoning in Materials Science

Sifan Wu, Huan Zhang, Yizhan Li, Farshid Effaty, Amirreza Ataei, Bang Liu

PDF

Open Access

TL;DR

This paper introduces MatVQA, a new benchmark for evaluating multimodal models' ability to perform detailed visual and scientific reasoning in materials science, addressing limitations of existing text-based datasets.

Contribution

We developed MatVQA, a novel, automatically generated benchmark with 1325 questions that challenge models to analyze material images and perform multi-step scientific reasoning, filling a critical gap in materials science AI evaluation.

Findings

01

Current MLLMs show significant performance gaps on MatVQA.

02

MatVQA emphasizes fine-grained visual analysis combined with scientific reasoning.

03

Benchmark data and code are publicly available to foster further research.

Abstract

The emergence of Multimodal Large Language Models (MLLMs) that integrate vision and language modalities has unlocked new potentials for scientific reasoning, outperforming prior benchmarks in both natural language and coding domains. Current materials science evaluation datasets such as MaScQA and SciQA remain largely text-based and fail to capture the visual and research-level analytic complexity required in materials discovery and design. We introduce MatVQA, a scalable benchmark specifically designed to address this gap. Generated via an automated pipeline, MArxivAgent, from recent materials literature, MatVQA features 1325 questions across four critical structure-property-performance (SPP) reasoning tasks. Uniquely, MatVQA employs an iterative process to eliminate textual shortcuts, compelling MLLMs to perform fine-grained, low-level visual analysis of material imagery (e.g.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies