From Text to Insight: Large Language Models for Materials Science Data Extraction
Mara Schilling-Wilhelmi, Marti\~no R\'ios-Garc\'ia, Sherjeel Shabih,, Mar\'ia Victoria Gil, Santiago Miret, Christoph T. Koch, Jos\'e A. M\'arquez,, Kevin Maik Jablonka

TL;DR
This paper reviews how large language models can transform materials science by efficiently extracting structured data from unstructured text, addressing current challenges and future opportunities in the field.
Contribution
It provides a comprehensive overview of LLM-based data extraction methods in materials science, highlighting frameworks and guiding principles for future research.
Findings
LLMs can automate data extraction from scientific literature.
Domain knowledge improves LLM accuracy and validation.
Frameworks for integrating LLMs with materials science are proposed.
Abstract
The vast majority of materials science knowledge exists in unstructured natural language, yet structured data is crucial for innovative and systematic materials design. Traditionally, the field has relied on manual curation and partial automation for data extraction for specific use cases. The advent of large language models (LLMs) represents a significant shift, potentially enabling efficient extraction of structured, actionable data from unstructured text by non-experts. While applying LLMs to materials science data extraction presents unique challenges, domain knowledge offers opportunities to guide and validate LLM outputs. This review provides a comprehensive overview of LLM-based structured data extraction in materials science, synthesizing current knowledge and outlining future directions. We address the lack of standardized guidelines and present frameworks for leveraging the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
