TL;DR
PrimeKG-CL introduces a realistic continual learning benchmark for biomedical knowledge graphs, capturing their asynchronous evolution and evaluating multiple strategies and models on real-world data.
Contribution
It provides the first benchmark built from authoritative biomedical databases with genuine temporal snapshots, multimodal features, and detailed stratification for continual graph learning.
Findings
Decoder choice and learning strategy interact strongly, affecting performance.
Standard metrics conflate retention of valid facts with forgetting outdated ones.
Multimodal features significantly improve entity-level tasks.
Abstract
Biomedical knowledge graphs underwrite drug repurposing and clinical decision support, yet the upstream ontologies they depend on update on independent cycles that add millions of edges and deprecate hundreds of thousands more between releases. Yet existing continual graph learning has been studied almost exclusively on synthetic random splits of static, generic KGs, a regime that cannot reproduce the asynchronous, structured evolution real biomedical KGs undergo. To this end, we introduce PrimeKG-CL, a CGL benchmark built from nine authoritative biomedical databases (129K+ nodes, 8.1M+ edges, 10 node types, 30 relation types) with two genuine temporal snapshots (June 2021, July 2023; 5.83M edges added, 889K removed, 7.21M persistent), 10 entity-type-grouped tasks, multimodal node features, and a per-task persistent/added/removed test stratification. On three tasks (biomedical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
