From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction
Koushik Rameshbabu, Jing Luo, Ali Shargh, Khalid A. El-Awady, Jaafar A. El-Awady

TL;DR
This paper introduces a hierarchical, prompt-driven LLM workflow that automatically extracts and reconstructs structured materials data from research articles, integrating information from text, tables, figures, and physics derivations with high accuracy.
Contribution
The authors develop a novel, priority-based LLM pipeline that automates comprehensive data extraction from scientific literature without task-specific fine-tuning, demonstrated on shock-physics experiments.
Findings
Achieved over 94% accuracy in data extraction from research articles.
Effective integration of text, tables, figures, and physics derivations for data reconstruction.
Scalable API implementation matching or exceeding chat-based accuracy.
Abstract
Scientific data are widely dispersed across research articles and are often reported inconsistently across text, tables, and figures, making manual data extraction and aggregation slow and error-prone. We present a prompt-driven, hierarchical workflow that uses a large language model (LLM) to automatically extract and reconstruct structured, shot-level shock-physics experimental records by integrating information distributed across text, tables, figures, and physics-based derivations from full-text published research articles, using alloy spall strength as a representative case study. The pipeline targeted 37 experimentally relevant fields per shot and applied a three-level priority strategy: (T1) direct extraction from text/tables, (T2) physics-based derivation using verified governing relations, and (T3) digitization from figures when necessary. Extracted values were normalized to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
