A Large-Scale, Cross-Disciplinary Corpus of Systematic Reviews
Pierre Achkar, Tim Gollub, Arno Simons, Harrisen Scells, Martin Potthast

TL;DR
This paper introduces a large, cross-disciplinary corpus of over 300,000 systematic reviews from all scientific fields, enabling benchmarking, artifact extraction, and meta-science analyses across disciplines.
Contribution
The authors present Webis-SR4ALL-26, a comprehensive, multi-domain dataset of systematic reviews with linked metadata and structured artifacts, along with tools for extraction and evaluation.
Findings
Large-scale corpus covering all scientific fields from OpenAlex
Normalized search strategies enable cross-domain retrieval benchmarking
Release of corpus, pipeline, and code for community use
Abstract
Existing benchmarks for systematic reviewing remain limited either in scale or in disciplinary coverage, with some collections comprising only a modest number of topics and others focusing primarily on biomedical research. We present Webis-SR4ALL-26, a large-scale, cross-disciplinary corpus of 301,871 systematic reviews spanning all scientific fields as covered by OpenAlex. Using a multi-stage pre-processing pipeline, we link reviews to resolved OpenAlex metadata and reference lists and extract, when explicitly reported, structured method artifacts relevant to retrieval and screening. These artifacts include reported search strategies (Boolean queries or keyword lists) that we normalize into executable approximations, as well as reported inclusion and exclusion criteria. Together, these layers support cross-domain benchmarking of retrieval and screening components against review…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
