Extracting Polymer Nanocomposite Samples from Full-Length Documents
Ghazal Khalighinejad, Defne Circi, L.C. Brinson, Bhuwan Dhingra

TL;DR
This paper explores using large language models to extract detailed polymer nanocomposite sample data from full research articles, introducing a new benchmark and evaluation methods for this complex task.
Contribution
It presents a novel benchmark and evaluation framework for extracting PNC samples from full texts, addressing annotation challenges and analyzing LLM performance and errors.
Findings
LLMs struggle to extract all samples accurately
Self-consistency improves extraction performance
Three main error categories identified and analyzed
Abstract
This paper investigates the use of large language models (LLMs) for extracting sample lists of polymer nanocomposites (PNCs) from full-length materials science research papers. The challenge lies in the complex nature of PNC samples, which have numerous attributes scattered throughout the text. The complexity of annotating detailed information on PNCs limits the availability of data, making conventional document-level relation extraction techniques impractical due to the challenge in creating comprehensive named entity span annotations. To address this, we introduce a new benchmark and an evaluation technique for this task and explore different prompting strategies in a zero-shot manner. We also incorporate self-consistency to improve the performance. Our findings show that even advanced LLMs struggle to extract all of the samples from an article. Finally, we analyze the errors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
