Outcome-based Evaluation of Systematic Review Automation
Wojciech Kusa, Guido Zuccon, Petr Knoth, Allan Hanbury

TL;DR
This paper introduces a novel evaluation framework for systematic review automation that considers the influence of each publication on the review outcome, moving beyond traditional binary relevance measures.
Contribution
It proposes an outcome-based evaluation method that assesses the impact of retrieved studies on the systematic review's final results, providing a more realistic measure of system performance.
Findings
Traditional measures may misjudge system quality by ignoring study impact.
The new framework alters system rankings compared to standard IR metrics.
Assessment of 74 runs shows different quality evaluations when considering review outcomes.
Abstract
Current methods of evaluating search strategies and automated citation screening for systematic literature reviews typically rely on counting the number of relevant and not relevant publications. This established practice, however, does not accurately reflect the reality of conducting a systematic review, because not all included publications have the same influence on the final outcome of the systematic review. More specifically, if an important publication gets excluded or included, this might significantly change the overall review outcome, while not including or excluding less influential studies may only have a limited impact. However, in terms of evaluation measures, all inclusion and exclusion decisions are treated equally and, therefore, failing to retrieve publications with little to no impact on the review outcome leads to the same decrease in recall as failing to retrieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
