Extracting Polymer Nanocomposite Samples from Full-Length Documents

Ghazal Khalighinejad; Defne Circi; L.C. Brinson; Bhuwan Dhingra

arXiv:2403.00260·cs.CL·March 4, 2024·3 cites

Extracting Polymer Nanocomposite Samples from Full-Length Documents

Ghazal Khalighinejad, Defne Circi, L.C. Brinson, Bhuwan Dhingra

PDF

Open Access 1 Repo

TL;DR

This paper explores using large language models to extract detailed polymer nanocomposite sample data from full research articles, introducing a new benchmark and evaluation methods for this complex task.

Contribution

It presents a novel benchmark and evaluation framework for extracting PNC samples from full texts, addressing annotation challenges and analyzing LLM performance and errors.

Findings

01

LLMs struggle to extract all samples accurately

02

Self-consistency improves extraction performance

03

Three main error categories identified and analyzed

Abstract

This paper investigates the use of large language models (LLMs) for extracting sample lists of polymer nanocomposites (PNCs) from full-length materials science research papers. The challenge lies in the complex nature of PNC samples, which have numerous attributes scattered throughout the text. The complexity of annotating detailed information on PNCs limits the availability of data, making conventional document-level relation extraction techniques impractical due to the challenge in creating comprehensive named entity span annotations. To address this, we introduce a new benchmark and an evaluation technique for this task and explore different prompting strategies in a zero-shot manner. We also incorporate self-consistency to improve the performance. Our findings show that even advanced LLMs struggle to extract all of the samples from an article. Finally, we analyze the errors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ghazalkhalighinejad/pncextract
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction