TL;DR
This paper presents a machine learning pipeline that automates key stages of systematic literature reviews, achieving high accuracy and generalizability with minimal expert annotation, significantly reducing review time.
Contribution
The authors develop and evaluate a comprehensive ML pipeline for automating systematic reviews, demonstrating effective automation of document retrieval, selection, and data extraction.
Findings
High accuracy achieved with only 2 weeks of annotation
Pipeline generalizes well to unseen data from different countries
Automation reduces review time by approximately 85%
Abstract
Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very time-consuming and require experts. Yet the three main stages of a systematic review are easily done automatically: searching for documents can be done via APIs and scrapers, selection of relevant documents can be done via binary classification, and extraction of data can be done via sequence-labelling classification. Despite the promise of automation for this field, little research exists that examines the various ways to automate each of these tasks. We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs. We test the ability of classifiers to work well on small amounts of data and to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
