Accelerated materials language processing enabled by GPT
Jaewoong Choi, Byungju Lee

TL;DR
This paper demonstrates that GPT-based prompt engineering can effectively replace complex traditional models for materials language processing tasks, achieving comparable or improved results with less data and effort.
Contribution
The study introduces GPT-enabled pipelines for document classification, NER, and extractive QA in materials science, simplifying architecture and reducing data requirements.
Findings
GPT-based classification matches prior accuracy with small datasets
Few-shot prompts improve entity recognition performance
GPT-enabled QA can automatically correct annotations
Abstract
Materials language processing (MLP) is one of the key facilitators of materials science research, as it enables the extraction of structured information from massive materials science literature. Prior works suggested high-performance MLP models for text classification, named entity recognition (NER), and extractive question answering (QA), which require complex model architecture, exhaustive fine-tuning and a large number of human-labelled datasets. In this study, we develop generative pretrained transformer (GPT)-enabled pipelines where the complex architectures of prior MLP models are replaced with strategic designs of prompt engineering. First, we develop a GPT-enabled document classification method for screening relevant documents, achieving comparable accuracy and reliability compared to prior models, with only small dataset. Secondly, for NER task, we design an entity-centric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Software Engineering Research
