Exploring LLMs for Scientific Information Extraction Using The SciEx Framework
Sha Li, Ayush Sadekar, Nathan Self, Yiqi Su, Lars Andersland, Mira Chaplin, Annabel Zhang, Hyoju Yang, James B Henderson, Krista Wigginton, Linsey Marr, T.M. Murali, Naren Ramakrishnan

TL;DR
This paper introduces SciEx, a modular framework that enhances scientific information extraction from complex, multi-modal literature using LLMs, addressing challenges like long documents and schema changes.
Contribution
SciEx provides a flexible, extensible architecture for scientific info extraction, decoupling components to improve adaptability and performance over existing methods.
Findings
Effective extraction across diverse scientific topics
Identifies strengths of LLM-based pipelines
Highlights limitations and areas for improvement
Abstract
Large language models (LLMs) are increasingly touted as powerful tools for automating scientific information extraction. However, existing methods and tools often struggle with the realities of scientific literature: long-context documents, multi-modal content, and reconciling varied and inconsistent fine-grained information across multiple publications into standardized formats. These challenges are further compounded when the desired data schema or extraction ontology changes rapidly, making it difficult to re-architect or fine-tune existing systems. We present SciEx, a modular and composable framework that decouples key components including PDF parsing, multi-modal retrieval, extraction, and aggregation. This design streamlines on-demand data extraction while enabling extensibility and flexible integration of new models, prompting strategies, and reasoning mechanisms. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Advanced Text Analysis Techniques · Research Data Management Practices
