OpenExtract: Automated Data Extraction for Systematic Reviews in Health
Jim Achterberg, Bram Van Dijk, Jing Meng, Saif Ul Islam, Gregory Epiphaniou, Carsten Maple, Xuefei Ding, Theodoros N. Arvanitis, Simon Brouwer, Marcel Haas, Marco Spruit

TL;DR
OpenExtract is an open-source tool that automates data extraction for systematic reviews using large language models, achieving high accuracy and efficiency in digital health literature reviews.
Contribution
It introduces a novel LLM-based pipeline for automated data extraction in systematic reviews, demonstrating high precision and recall compared to human researchers.
Findings
Achieves > 0.8 precision and recall in data extraction
Effective in digital health systematic reviews
Reduces manual effort in literature reviews
Abstract
This study presents OpenExtract, an open-source pipeline for automated data extraction in large-scale systematic literature reviews. The pipeline queries large language models (LLMs) to predict data entries based on relevant sections of scientific articles. To test the efficacy of OpenExtract, we apply it to a systematic literature review in digital health and compare its outputs with those of human researchers. OpenExtract achieves precision and recall scores of > 0.8 in this task, indicating that it can be effective at extracting data automatically and efficiently. OpenExtract: https://github.com/JimAchterbergLUMC/OpenExtract.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Meta-analysis and systematic reviews · Computational and Text Analysis Methods
