Large language models in materials science and the need for open-source approaches
Fengxu Yang, Weitong Chen, Jack D. Evans

TL;DR
This paper reviews how large language models are transforming materials science, emphasizing the importance of open-source models for transparency, reproducibility, and community-driven scientific discovery.
Contribution
It highlights the potential of open-source LLMs in materials science and demonstrates their performance parity with commercial models through benchmark results.
Findings
Open-source LLMs match commercial model performance
Open-source models offer greater transparency and reproducibility
Open-source models are cost-effective and privacy-preserving
Abstract
Large language models (LLMs) are rapidly transforming materials science. This review examines recent LLM applications across the materials discovery pipeline, focusing on three key areas: mining scientific literature , predictive modelling, and multi-agent experimental systems. We highlight how LLMs extract valuable information such as synthesis conditions from text, learn structure-property relationships, and can coordinate agentic systems integrating computational tools and laboratory automation. While progress has been largely dependent on closed-source commercial models, our benchmark results demonstrate that open-source alternatives can match performance while offering greater transparency, reproducibility, cost-effectiveness, and data privacy. As open-source models continue to improve, we advocate their broader adoption to build accessible, flexible, and community-driven AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Artificial Intelligence in Healthcare and Education · Inorganic Chemistry and Materials
