Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies
Massimiliano Pronesti, Joao Bettencourt-Silva, Paul Flanagan, Alessandra Pascale, Oisin Redmond, Anya Belz, Yufang Hou

TL;DR
This paper introduces a new dataset and a retrieval-augmented framework for extracting scientific evidence from biomedical studies at the document level, especially for conflicting clinical questions.
Contribution
It creates CochraneForest dataset and proposes URCA, a novel retrieval-augmented generation method for evidence extraction in biomedical literature.
Findings
URCA outperforms existing methods by up to 10.3% F1 score.
CochraneForest is a challenging testbed for evidence synthesis.
The approach advances automated extraction of biomedical evidence.
Abstract
Extracting scientific evidence from biomedical studies for clinical research questions (e.g., Does stem cell transplantation improve quality of life in patients with medically refractory Crohn's disease compared to placebo?) is a crucial step in synthesising biomedical evidence. In this paper, we focus on the task of document-level scientific evidence extraction for clinical questions with conflicting evidence. To support this task, we create a dataset called CochraneForest, leveraging forest plots from Cochrane systematic reviews. It comprises 202 annotated forest plots, associated clinical research questions, full texts of studies, and study-specific conclusions. Building on CochraneForest, we propose URCA (Uniform Retrieval Clustered Augmentation), a retrieval-augmented generation framework designed to tackle the unique challenges of evidence extraction. Our experiments show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Advanced Text Analysis Techniques
MethodsFocus
