MatTools: Benchmarking Large Language Models for Materials Science Tools

Siyu Liu; Bo Hu; Beilin Ye; Jiamin Xu; David J. Srolovitz; Tongqi Wen

arXiv:2505.10852·cond-mat.mtrl-sci·December 17, 2025

MatTools: Benchmarking Large Language Models for Materials Science Tools

Siyu Liu, Bo Hu, Beilin Ye, Jiamin Xu, David J. Srolovitz, Tongqi Wen

PDF

Open Access 1 Repo 1 Datasets

TL;DR

MatTools introduces a comprehensive benchmark to evaluate large language models' ability to understand and generate code for materials science applications, combining QA and real-world tool usage assessments.

Contribution

This work presents a novel benchmark framework, including a large QA dataset and real-world code generation tasks, for evaluating LLMs in materials science contexts.

Findings

01

Generalist LLMs outperform specialists

02

AI models are aware of other AI models

03

Simpler models perform better in this domain

Abstract

Large language models (LLMs) are increasingly applied to materials science questions, including literature comprehension, property prediction, materials discovery and alloy design. At the same time, a wide range of physics-based computational approaches have been developed in which materials properties can be calculated. Here, we propose a benchmark application to evaluate the proficiency of LLMs to answer materials science questions through the generation and safe execution of codes based on such physics-based computational materials science packages. MatTools is built on two complementary components: a materials simulation tool question-answer (QA) benchmark and a real-world tool-usage benchmark. We designed an automated methodology to efficiently collect real-world materials science tool-use examples. The QA benchmark, derived from the pymatgen (Python Materials Genomics) codebase…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Grenzlinie/MatTools
noneOfficial

Datasets

SiyuLiu/MatTools
dataset· 35 dl
35 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Artificial Intelligence in Healthcare and Education · Inorganic Chemistry and Materials