Building evidence-based knowledge bases from full-text literature for disease-specific biomedical reasoning

Chang Zong; Sicheng Lv; Si-tu Xue; Huilin Zheng; Jian Wan; Lei Zhang

arXiv:2603.28325·cs.CE·April 15, 2026

Building evidence-based knowledge bases from full-text literature for disease-specific biomedical reasoning

Chang Zong, Sicheng Lv, Si-tu Xue, Huilin Zheng, Jian Wan, Lei Zhang

PDF

TL;DR

EvidenceNet is a novel disease-specific dataset of structured biomedical evidence extracted from full-text literature, enabling improved reasoning, question answering, and link prediction in biomedical research.

Contribution

The paper introduces EvidenceNet, a large language model-assisted pipeline for extracting and structuring evidence from biomedical literature, with high accuracy and utility for various analyses.

Findings

01

High extraction accuracy (98.3%) at the field level

02

EvidenceNet supports retrieval-augmented question answering

03

Enables graph-based tasks like link prediction and target prioritization

Abstract

Biomedical knowledge resources often either preserve evidence as unstructured text or compress it into flat triples that omit study design, provenance, and quantitative support. Here we present EvidenceNet, a disease-specific dataset of record-level evidence collections and corresponding graph representations derived from full-text biomedical literature. EvidenceNet uses a large language model (LLM)-assisted pipeline to extract experimentally grounded findings as structured evidence records, normalize biomedical entities, score evidence quality, and connect related records through typed semantic relations. We release EvidenceNet-HCC with 7,872 evidence records and a corresponding graph with 10,328 nodes and 49,756 edges, and EvidenceNet-CRC with 6,622 records and a corresponding graph with 8,795 nodes and 39,361 edges. Technical validation shows high component fidelity, including 98.3%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.