TL;DR
This paper introduces SR4CS, a large-scale, openly available collection of computer science systematic reviews designed to facilitate reproducible research on retrieval and screening automation.
Contribution
It provides a comprehensive dataset with reviews, search queries, and metadata, along with baseline experiments comparing retrieval methods and zero-shot language models.
Findings
Baseline experiments reveal differences in precision, recall, and ranking among retrieval paradigms.
Naive zero-shot Boolean query generation has notable limitations.
The dataset supports reproducible evaluation of systematic review automation methods.
Abstract
Systematic reviews are the standard method for synthesizing scientific evidence, but their creation requires substantial manual effort, particularly during retrieval and screening. While recent work has explored automating these steps, evaluation resources remain largely confined to the biomedical domain, limiting reproducible experimentation in other domains. This paper introduces SR4CS, a large-scale collection of systematic reviews in computer science, designed to support reproducible research on Boolean query generation, retrieval, and screening. The corpus comprises 1,212 systematic reviews with their original expert-designed Boolean search queries, 104,316 resolved references, and structured methodological metadata. For controlled evaluation, the original Boolean queries are additionally provided in a normalized, approximated form operating over titles and abstracts. To illustrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
