Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions
Hongchen Wang, Kangming Li, Scott Ramsay, Yao Fehlis, Edward Kim, and Jason Hattrick-Simpers

TL;DR
This paper evaluates the performance and robustness of large language models in materials science tasks, revealing their strengths and vulnerabilities across question answering and property prediction under various conditions.
Contribution
It provides a comprehensive assessment of LLMs in materials science, highlighting their behaviors, limitations, and potential for improvement in domain-specific applications.
Findings
LLMs perform variably across different materials science tasks.
Robustness of LLMs is challenged by adversarial and noisy inputs.
Unique phenomena like mode collapse and performance recovery are observed.
Abstract
Large Language Models (LLMs) have the potential to revolutionize scientific research, yet their robustness and reliability in domain-specific applications remain insufficiently explored. In this study, we evaluate the performance and robustness of LLMs for materials science, focusing on domain-specific question answering and materials property prediction across diverse real-world and adversarial conditions. Three distinct datasets are used in this study: 1) a set of multiple-choice questions from undergraduate-level materials science courses, 2) a dataset including various steel compositions and yield strengths, and 3) a band gap dataset, containing textual descriptions of material crystal structures and band gap values. The performance of LLMs is assessed using various prompting strategies, including zero-shot chain-of-thought, expert prompting, and few-shot in-context learning. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence · Data Quality and Management
MethodsSparse Evolutionary Training
