LLM-Assisted Abstract Screening with OLIVER: Evaluating Calibration and Single-Model vs. Actor-Critic Configurations in Literature Reviews
Kian Godhwani, David Benrimoh

TL;DR
This paper introduces OLIVER, an open-source LLM-assisted screening pipeline, evaluates multiple models and configurations across systematic reviews, and finds that actor-critic frameworks enhance screening accuracy and calibration over single models.
Contribution
The study presents OLIVER, compares LLM configurations, and demonstrates that actor-critic models improve screening performance and calibration in literature reviews.
Findings
Model performance varies widely across reviews.
Actor-critic framework improves discrimination and calibration.
Calibration remains weak in single-model setups.
Abstract
Introduction: Recent work suggests large language models (LLMs) can accelerate screening, but prior evaluations focus on earlier LLMs, standardized Cochrane reviews, single-model setups, and accuracy as the primary metric, leaving generalizability, configuration effects, and calibration largely unexamined. Methods: We developed OLIVER (Optimized LLM-based Inclusion and Vetting Engine for Reviews), an open-source pipeline for LLM-assisted abstract screening. We evaluated multiple contemporary LLMs across two non-Cochrane systematic reviews and performance was assessed at both the full-text screening and final inclusion stages using accuracy, AUC, and calibration metrics. We further tested an actor-critic screening framework combining two lightweight models under three aggregation rules. Results: Across individual models, performance varied widely. In the smaller Review 1 (821…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Meta-analysis and systematic reviews · Topic Modeling
