Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark
Jianyou Wang, Weili Cao, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, Leon Bergen

TL;DR
The RoBBR benchmark assesses the risk of bias in biomedical literature by evaluating models' ability to interpret and align expert judgments on research methodologies, aiding reliable evidence synthesis.
Contribution
This paper introduces a novel benchmark for measuring methodological quality in biomedical papers, utilizing expert annotations and large language models.
Findings
Large language models' reasoning affects bias assessment accuracy.
The benchmark enables fine-grained alignment of judgments with paper sentences.
The dataset supports improved evaluation of bias detection models.
Abstract
Systems that answer questions by reviewing the scientific literature are becoming increasingly feasible. To draw reliable conclusions, these systems should take into account the quality of available evidence from different studies, placing more weight on studies that use a valid methodology. We present a benchmark for measuring the methodological strength of biomedical papers, drawing on the risk-of-bias framework used for systematic reviews. Derived from over 500 biomedical studies, the three benchmark tasks encompass expert reviewers' judgments of studies' research methodologies, including the assessments of risk of bias within these studies. The benchmark contains a human-validated annotation pipeline for fine-grained alignment of reviewers' judgments with research paper sentences. Our analyses show that large language models' reasoning and retrieval capabilities impact their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Clinical practice guidelines implementation · Meta-analysis and systematic reviews
