Dynamic In-context Learning with Conversational Models for Data Extraction and Materials Property Prediction
Chinedu Ekuma

TL;DR
This paper presents PropertyExtractor, an open-source conversational LLM-based tool that accurately extracts and verifies material property data from scholarly papers, addressing data trustworthiness and creating valuable property databases.
Contribution
The paper introduces PropertyExtractor, a novel tool combining advanced conversational LLMs, in-context learning, and prompt engineering for autonomous material data extraction and verification.
Findings
Achieves over 95% precision and recall in data extraction.
Effectively creates databases for 2D material thicknesses and energy bandgaps.
Demonstrates the tool's scalability and accuracy in material data curation.
Abstract
The advent of natural language processing and large language models (LLMs) has revolutionized the extraction of data from unstructured scholarly papers. However, ensuring data trustworthiness remains a significant challenge. In this paper, we introduce PropertyExtractor, an open-source tool that leverages advanced conversational LLMs like Google gemini-pro and OpenAI gpt-4, blends zero-shot with few-shot in-context learning, and employs engineered prompts for the dynamic refinement of structured information hierarchies - enabling autonomous, efficient, scalable, and accurate identification, extraction, and verification of material property data. Our tests on material data demonstrate precision and recall that exceed 95\% with an error rate of approximately 9%, highlighting the effectiveness and versatility of the toolkit. Finally, databases for 2D material thicknesses, a critical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction
MethodsAttention Is All You Need · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout
