Building evidence-based knowledge bases from full-text literature for disease-specific biomedical reasoning
Chang Zong, Sicheng Lv, Si-tu Xue, Huilin Zheng, Jian Wan, Lei Zhang

TL;DR
EvidenceNet is a novel disease-specific dataset of structured biomedical evidence extracted from full-text literature, enabling improved reasoning, question answering, and link prediction in biomedical research.
Contribution
The paper introduces EvidenceNet, a large language model-assisted pipeline for extracting and structuring evidence from biomedical literature, with high accuracy and utility for various analyses.
Findings
High extraction accuracy (98.3%) at the field level
EvidenceNet supports retrieval-augmented question answering
Enables graph-based tasks like link prediction and target prioritization
Abstract
Biomedical knowledge resources often either preserve evidence as unstructured text or compress it into flat triples that omit study design, provenance, and quantitative support. Here we present EvidenceNet, a disease-specific dataset of record-level evidence collections and corresponding graph representations derived from full-text biomedical literature. EvidenceNet uses a large language model (LLM)-assisted pipeline to extract experimentally grounded findings as structured evidence records, normalize biomedical entities, score evidence quality, and connect related records through typed semantic relations. We release EvidenceNet-HCC with 7,872 evidence records and a corresponding graph with 10,328 nodes and 49,756 edges, and EvidenceNet-CRC with 6,622 records and a corresponding graph with 8,795 nodes and 39,361 edges. Technical validation shows high component fidelity, including 98.3%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
