Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies

Massimiliano Pronesti; Joao Bettencourt-Silva; Paul Flanagan; Alessandra Pascale; Oisin Redmond; Anya Belz; Yufang Hou

arXiv:2505.06186·cs.CL·June 2, 2025

Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies

Massimiliano Pronesti, Joao Bettencourt-Silva, Paul Flanagan, Alessandra Pascale, Oisin Redmond, Anya Belz, Yufang Hou

PDF

Open Access 1 Video

TL;DR

This paper introduces a new dataset and a retrieval-augmented framework for extracting scientific evidence from biomedical studies at the document level, especially for conflicting clinical questions.

Contribution

It creates CochraneForest dataset and proposes URCA, a novel retrieval-augmented generation method for evidence extraction in biomedical literature.

Findings

01

URCA outperforms existing methods by up to 10.3% F1 score.

02

CochraneForest is a challenging testbed for evidence synthesis.

03

The approach advances automated extraction of biomedical evidence.

Abstract

Extracting scientific evidence from biomedical studies for clinical research questions (e.g., Does stem cell transplantation improve quality of life in patients with medically refractory Crohn's disease compared to placebo?) is a crucial step in synthesising biomedical evidence. In this paper, we focus on the task of document-level scientific evidence extraction for clinical questions with conflicting evidence. To support this task, we create a dataset called CochraneForest, leveraging forest plots from Cochrane systematic reviews. It comprises 202 annotated forest plots, associated clinical research questions, full texts of studies, and study-specific conclusions. Building on CochraneForest, we propose URCA (Uniform Retrieval Clustered Augmentation), a retrieval-augmented generation framework designed to tackle the unique challenges of evidence extraction. Our experiments show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies· underline

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Advanced Text Analysis Techniques

MethodsFocus