OpenExtract: Automated Data Extraction for Systematic Reviews in Health

Jim Achterberg; Bram Van Dijk; Jing Meng; Saif Ul Islam; Gregory Epiphaniou; Carsten Maple; Xuefei Ding; Theodoros N. Arvanitis; Simon Brouwer; Marcel Haas; Marco Spruit

arXiv:2603.13338·cs.IR·March 17, 2026

OpenExtract: Automated Data Extraction for Systematic Reviews in Health

Jim Achterberg, Bram Van Dijk, Jing Meng, Saif Ul Islam, Gregory Epiphaniou, Carsten Maple, Xuefei Ding, Theodoros N. Arvanitis, Simon Brouwer, Marcel Haas, Marco Spruit

PDF

Open Access

TL;DR

OpenExtract is an open-source tool that automates data extraction for systematic reviews using large language models, achieving high accuracy and efficiency in digital health literature reviews.

Contribution

It introduces a novel LLM-based pipeline for automated data extraction in systematic reviews, demonstrating high precision and recall compared to human researchers.

Findings

01

Achieves > 0.8 precision and recall in data extraction

02

Effective in digital health systematic reviews

03

Reduces manual effort in literature reviews

Abstract

This study presents OpenExtract, an open-source pipeline for automated data extraction in large-scale systematic literature reviews. The pipeline queries large language models (LLMs) to predict data entries based on relevant sections of scientific articles. To test the efficacy of OpenExtract, we apply it to a systematic literature review in digital health and compare its outputs with those of human researchers. OpenExtract achieves precision and recall scores of > 0.8 in this task, indicating that it can be effective at extracting data automatically and efficiently. OpenExtract: https://github.com/JimAchterbergLUMC/OpenExtract.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Meta-analysis and systematic reviews · Computational and Text Analysis Methods