Accelerated materials language processing enabled by GPT

Jaewoong Choi; Byungju Lee

arXiv:2308.09354·cs.CL·August 21, 2023·2 cites

Accelerated materials language processing enabled by GPT

Jaewoong Choi, Byungju Lee

PDF

Open Access

TL;DR

This paper demonstrates that GPT-based prompt engineering can effectively replace complex traditional models for materials language processing tasks, achieving comparable or improved results with less data and effort.

Contribution

The study introduces GPT-enabled pipelines for document classification, NER, and extractive QA in materials science, simplifying architecture and reducing data requirements.

Findings

01

GPT-based classification matches prior accuracy with small datasets

02

Few-shot prompts improve entity recognition performance

03

GPT-enabled QA can automatically correct annotations

Abstract

Materials language processing (MLP) is one of the key facilitators of materials science research, as it enables the extraction of structured information from massive materials science literature. Prior works suggested high-performance MLP models for text classification, named entity recognition (NER), and extractive question answering (QA), which require complex model architecture, exhaustive fine-tuning and a large number of human-labelled datasets. In this study, we develop generative pretrained transformer (GPT)-enabled pipelines where the complex architectures of prior MLP models are replaced with strategic designs of prompt engineering. First, we develop a GPT-enabled document classification method for screening relevant documents, achieving comparable accuracy and reliability compared to prior models, with only small dataset. Secondly, for NER task, we design an entity-centric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Topic Modeling · Software Engineering Research