Scalable and interpretable rule-based link prediction for large heterogeneous knowledge graphs
Simon Ott, Laura Graf, Asan Agibetov, Christian Meilicke, Matthias, Samwald

TL;DR
This paper introduces SAFRAN, a scalable, interpretable rule-based framework that significantly improves link prediction in large biomedical knowledge graphs, achieving state-of-the-art results and faster inference.
Contribution
SAFRAN enhances the rule-based link prediction approach by scalable rule aggregation, enabling it to outperform existing methods on large-scale biomedical datasets.
Findings
SAFRAN achieves state-of-the-art results on FB15K-237 and OpenBioLink.
Inference speeds are increased by up to two orders of magnitude.
SAFRAN narrows the performance gap between rule-based and embedding-based methods.
Abstract
Neural embedding-based machine learning models have shown promise for predicting novel links in biomedical knowledge graphs. Unfortunately, their practical utility is diminished by their lack of interpretability. Recently, the fully interpretable, rule-based algorithm AnyBURL yielded highly competitive results on many general-purpose link prediction benchmarks. However, its applicability to large-scale prediction tasks on complex biomedical knowledge bases is limited by long inference times and difficulties with aggregating predictions made by multiple rules. We improve upon AnyBURL by introducing the SAFRAN rule application framework which aggregates rules through a scalable clustering algorithm. SAFRAN yields new state-of-the-art results for fully interpretable link prediction on the established general-purpose benchmark FB15K-237 and the large-scale biomedical benchmark OpenBioLink.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Bioinformatics and Genomic Networks · Genomics and Rare Diseases
MethodsSAFRAN - Scalable and fast non-redundant rule application · Symbolic rule learning
