Open Biomedical Knowledge Graphs at Scale: Construction, Federation, and AI Agent Access with Samyama Graph Database
Madhulatha Mandarapu, Sandeep Kunkunuru

TL;DR
This paper presents the construction, federation, and AI agent access methods for large-scale open biomedical knowledge graphs built on the Samyama graph database, enabling efficient cross-source querying and high-accuracy LLM-based question answering.
Contribution
It introduces a reproducible ETL pattern for large-scale biomedical KGs, demonstrates cross-KG federation, and develops schema-driven MCP server tools for LLM access with high accuracy.
Findings
Federated graph loads in ~3 minutes on commodity hardware.
Single-KG queries complete in 80-100ms.
Cross-KG federation joins take 1-4 seconds.
Abstract
Biomedical knowledge is fragmented across siloed databases -- Reactome for pathways, STRING for protein interactions, ClinicalTrials.gov for study registries, DrugBank for drug vocabularies, DGIdb for drug-gene interactions, SIDER for side effects. We present three open-source biomedical knowledge graphs -- Pathways KG (118,686 nodes, 834,785 edges from 5 sources), Clinical Trials KG (7,774,446 nodes, 26,973,997 edges from 5 sources), and Drug Interactions KG (32,726 nodes, 191,970 edges from 3 sources) -- built on Samyama, a high-performance graph database written in Rust. Our contributions are threefold. First, we describe a reproducible ETL pattern for constructing large-scale KGs from heterogeneous public data sources, with cross-source deduplication, batch loading (Python Cypher and Rust native loaders), and portable snapshot export. Second, we demonstrate cross-KG federation:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Bioinformatics and Genomic Networks
