OptimusKG: Unifying biomedical knowledge in a modern multimodal graph
Lucas Vittor, Ayush Noori, I\~naki Arango, Joaqu\'in Polonuer, Sam Rodriques, Andrew White, David A. Clifton, Marinka Zitnik

TL;DR
OptimusKG is a comprehensive, schema-enforced multimodal biomedical knowledge graph integrating diverse structured resources to support advanced biomedical research and machine learning applications.
Contribution
It introduces a large, unified biomedical graph with schema constraints, detailed metadata, and provenance, validated against scientific literature evidence.
Findings
70.0% of edges supported by literature evidence
83.4% of false edges lacked supporting evidence
Graph enables biomedical discovery and hypothesis generation
Abstract
Biomedical knowledge graphs (KGs) are widely used in the life sciences, yet many are derived from unstructured documents and therefore lack schema-level constrains, whereas graphs assembled from structured resources are difficult to harmonize into a unified representation. We present OptimusKG, a multimodal biomedical labeled property graph (LPG) built from structured and semi-structured resources to preserve factual, type-specific metadata across molecular, anatomical, clinical, and environmental domains. OptimusKG contains 190,531 nodes across 10 entity types, 21,813,816 edges across 26 relation types, and 67,249,863 property instances encoding 110,276,843 values across 150 distinct property keys, derived from 18 ontologies and controlled vocabularies. The graph enforces a top-level schema for nodes and edges and retains granular, type-specific properties, cross-references, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
