TL;DR
This paper investigates how various factors like training setup, hyperparameters, and data splits influence the performance of knowledge graph embedding models in drug discovery, emphasizing the need for comprehensive reporting for reproducibility.
Contribution
It provides a detailed analysis of the impact of experimental factors on KGE model performance in drug discovery, highlighting the importance of reporting these factors for reproducibility.
Findings
Training setup significantly affects model performance.
Hyperparameters and initialisation seed influence results.
Performance rankings of models can change based on experimental choices.
Abstract
Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required. In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
