Comic: explainable drug repurposing via contrastive masking for interpretable connections

Naafey Aamer; Muhammad Nabeel Asim; Andreas Dengel

PMC · DOI:10.1186/s12859-025-06337-4·January 9, 2026

Comic: explainable drug repurposing via contrastive masking for interpretable connections

Naafey Aamer, Muhammad Nabeel Asim, Andreas Dengel

PDF

Open Access

TL;DR

COMIC is a new AI tool that improves drug repurposing by identifying key drug-disease interactions and biological pathways, offering better performance and explainability.

Contribution

COMIC introduces a novel multi-channel architecture for explainable and high-performing drug repurposing predictions.

Findings

01

COMIC achieved a 9.55% average performance improvement over existing state-of-the-art methods.

02

The predictor successfully identified 21 out of 30 FDA-approved repurposed drug-disease pairs with high confidence.

03

A web-based interface was developed to enable real-time drug repurposing investigations with mechanistic pathway explanations.

Abstract

Many diseases worldwide remain untreated due to the slow and expensive process of drug development. Repurposing existing FDA-approved drugs offers a faster solution, especially with the assistance of artificial intelligence. Despite advancements in AI-driven drug repurposing, current approaches either have lackluster performance or fail to highlight the intricate pathways through which drugs act on diseases. The clinical utility of AI-driven drug repurposing remains constrained by these limitations, particularly for rare and undertreated diseases where data is scarce. To address the need for a precise and explainable predictor, this paper introduces COMIC (COntrastive Masking with Interpretable Connections), a predictor that employs a multi channel architecture consisting of a feature masking branch, which identifies critical drug-disease interaction patterns by extracting the most…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes10

IL7R MS4A1 MTOR MC1R AKT1 HLA-DRA SLC39A11 CD274 CD40 PMEL

Proteins10

Species1

Homo sapiens(human · species)

Chemicals14

Lecanemab Prazosin Atezolizumab Bortezomib Ocrelizumab Metformin COMIC Baricitinib Thalidomide Ustekinumab Concizumab Lebrikizumab Nivolumab Tezepelumab

Diseases39

Figures3

Click any figure to enlarge with its caption.

Overview of the COMIC architecture, illustrating the constrastive masking approach

Insertion performance evaluationFig. 3Deletion performance evaluation

Funding1

—Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau (6375)

Keywords

Drug repurposingKnowledge graphsBiomedicineBioinformaticsAritificial intelligenceContrastive learning

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Machine Learning in Healthcare · Bioinformatics and Genomic Networks

Full text

Introduction

The drug repurposing paradigm encompasses interactions of drugs and disease from two perspectives: indications and contraindications. Indications refer to the FDA approved medical uses of a drug, specifically the diseases the drug is approved to treat [63]. These approvals are based on exhaustive clinical research, including clinical trials, real-world evidence, and comprehensive safety and efficacy assessments. Predicting indications is the most significant step for drug repurposing as it encapsulates the potential of a drug to treat a specific disease, forming the primary objective. On the other hand, Contraindications are situations where a drug should not be used because it may cause harm to patients with certain diseases [63]. In drug repurposing, understanding contraindications is a secondary step to ensure that a drug being considered for a new use does not pose risks to individuals with specific health conditions. But at the same time, uncovering new indications remains the key objective.

The motivation for finding new indications is grounded in the understanding that drugs often have cascading effects, meaning they can influence multiple biological pathways beyond their primary targets [3]. Around 30% of FDA-approved drugs have received approval for additional indications beyond their initial intended use with some accumulating over ten new therapeutic applications [54]. However, many successful drug repurposing efforts have historically resulted from serendipitous discoveries [56]. These include cases where clinicians observed unexpected benefits when drugs were used for conditions other than their original purpose, such as the use of Metformin to treat polycystic ovary syndrome and the unexpected drug benefits in off-label settings of Thalidomide for multiple myeloma [6, 52]. In contrast to random discoveries, a systematic strategy could enable the repurposing of existing drugs for new therapeutic uses with fewer side effects [29]. For instance, systematic analysis of drug-protein interaction networks led to the identification of kinase inhibitors like Baricitinib for COVID-19 treatment by targeting both viral entry and inflammation pathways, while network-based screening revealed the potential of antihypertensive drugs like Prazosin for treating post-traumatic stress disorder (PTSD) through shared molecular pathways [46, 53].

One of these strategies is the potential matching of drugs to new indications through the analysis of medical knowledge graphs (KGs) [13]. By evaluating the actions of drugs on cell signaling pathways, gene expression profiles, and disease phenotypes, potential therapeutic candidates can be identified [3]. The underlying patterns of these actions are learned by geometric deep-learning models, which are trained on extensive medical KGs [37]. These optimized models can efficiently align disease signatures with drug candidates by analyzing their biological networks [37].

While approaches that utilize knowledge graphs (KGs) and deep learning models have improved the systematic identification of repurposing candidates, most current approaches function as ‘black boxes.’ These comprehensive, state-of-the-art predictors [1, 27] provide predictions without offering clear, human-understandable explanations of their underlying reasoning [62]. This lack of explainability poses a significant barrier to clinical adoption and validation of computational drug repurposing predictions [65]. From a biological perspective, the identification of specific pathways connecting drug candidates to disease targets is crucial for several reasons [60]. Biological pathways represent the fundamental mechanisms through which drugs exert their therapeutic effects. Through explainable methods, accurate prediction of pathways provides mechanistic validation that a predicted drug-disease association operates through biologically plausible routes, distinguishing meaningful therapeutic opportunities from spurious correlations [12]. Moreover, pathway-level explanations enable researchers to assess safety implications by identifying potential off-target effects or conflicting regulatory mechanisms that might contraindicate use in specific patient populations [10, 72]. Additionally, a drug repurposed for a disease can be unrelated to the indication it was initially studied for, as exemplified by Metformin’s evolution from a diabetes treatment to potential anti-cancer properties [55]. Thus, mechanistic explanations for each prediction are crucial for building clinical trust in AI.

On the other hand, existing predictors that leverage knowledge graphs to predict drug-disease interactions, and give pathway level interpretability [68], struggle to balance predictive power with plausible explanations. Some of these approaches rely on predefined path schemas [40], limiting their ability to discover novel biological mechanisms . Others use complex learning strategies (such as reinforcement learning) [39, 14] that require extensive tuning, making them less adaptable across different diseases. Rule-based methods provide clear explanations [45] but struggle with generalization, while deep learning based predictors often prioritize accuracy at the cost of transparency [35]. These challenges highlight the need for approaches that can provide biologically meaningful explanations without sacrificing predictive performance.

Apart from the trade-off between performance and explainability, another major challenge that predictors face is data scarcity. Around 92% of 17,080 examined diseases have no approved indications and this scarcity is even more pronounced for rare diseases, where approximately 95% lack FDA-approved treatments and up to 85% have no promising drug candidates [17, 29]. This limited data constrains predictive methods, leaving them with too few examples to learn meaningful drug-disease relationships. These minimal therapeutic options and limited molecular understanding poses a significant challenge for drug-repurposing predictors [29].

These challenges highlight the need for predictors that can effectively learn from sparse data while providing mechanistic insights to validate repurposing predictions, particularly when exploring novel therapeutic applications. To address this gap, we introduce COMIC, a predictor that utilizes COntrastive Masking with Interpretable Connections and aims to balances predictive performance with interpretability. The primary motivation behind incorporating contrastive masking is to identify and preserve the most informative features within sparse data. Through contrastive learning, these networks distinguish meaningful patterns and paths, allowing the predictor to capture both common and rare but significant interaction signatures. This selective approach makes the prediction process interpretable at multiple levels: from identifying critical interaction features through the feature selection network to discovering relevant biological pathways through the path selection network. This enables tracking how each prediction emerges through selected features and pathways while maintaining strong predictive performance. The proposed predictor is evaluated across nine disease categories of the PrimeKG knowledge graph (comprising 17,080 diseases, and 4 M+ relationships). COMIC demonstrated a significant performance boost over the current state-of-the-art [29], achieving an average improvement of 4.29% in indication prediction and 14.82% in contraindication prediction, resulting in an overall average gain of 9.55% across both interaction types.

To validate COMIC’s real-world applicability, we evaluated its performance on 30 FDA-approved (2022–2025) repurposed indications that are not present in PrimeKG. The predictor successfully predicted 21 out of 30 (70%) approved indications, with 10 predictions ranking above the 90th percentile. Compared to the state-of-the-art [29], COMIC showed superior discrimination ability. These results demonstrate COMIC’s practical utility while highlighting opportunities for improvement in challenging domains like oncology and neurodegenerative diseases. To facilitate broad adoption, we have deployed our predictor as a web application that provides both predictions and their comprehensive explanations for indications, contraindications, and off-label use https://sds-genetic-interaction-analysis.opendfki.de/drug_prediction/.

Related work

Recent technological advances have led to the rapid development of numerous knowledge graph-based drug repurposing predictors. These predictors can be categorized into two distinct groups: (1) predictors that solely identify interactions between drugs and diseases and (2) predictors that not only make interaction predictions but also provide explanations. As discussed in the previous section, explanations play a crucial role in drug repurposing; therefore, this section only provides a high-level overview of the second category of predictors. These explainable predictors have evolved into distinct streams, each employs different strategies for model interpretability, ranging from attention-based path weighting to rule-based reasoning systems. While these methods have made significant progress in explaining drug-disease associations, they involve trade-offs between interpretability, computational efficiency, and biological plausibility.

KGML-xDTD employs a hybrid approach combining knowledge graph embeddings with reinforcement learning to identify drug-disease associations [40]. MINERVA and PoLo employ similar reinforcement learning agents to traverse KGs, generating rule-guided paths that mimic human hypothesis generation [14, 39]. While KMGL-xDTD’s attention-weighted metapaths capture biological relationships, it is constrained by predefined metapath schemas and reinforcement learning based predictors require extensive reward engineering and may overfit to specific traversal patterns, especially on large graphs like PrimeKG.

XG4Repo employs the RNNLogic+’s framework which uses neural logical reasoning in which a RNN generates logic rules for drug repositioning patterns [31, 51]. Hybrid frameworks like SAFRAN integrate KG embeddings with rule-based systems, using Noisy-OR functions to aggregate metapath scores [45]. These rules based system are often constrained by the expressiveness of their generated rules, which may fail to generalize across diverse drug-disease relationships.

A promising approach involves using graph explanation models to interpret Graph Neural Networks (GNNs). These methods analyze GNNs by selectively masking parts of the graph and assessing how the absence of these parts impacts predictions, thereby highlighting the most influential components in the decision-making process. RD-Explainer adapts GNNExplainer to generate semantic subgraphs highlighting genes, symptoms, and pathways [49, 69]. TxGNN integrates message-passing graph neural networks with metric learning to rank drug-disease pairs across 17,080 diseases [29], achieving state-of-the-art performance. However, its interpretability relies on a post-hoc GraphMask-based explainer to identify critical pathways [58]. COMIC builds upon TxGNN’s extensive disease coverage and GNN architecture, introducing a contrastive masking-based predictor that provides end-to-end interpretability. Unlike TxGNN, COMIC learns to distinguish relevant features and pathways during training, making interpretability an inherent part of its process. This design allows COMIC to jointly optimize for both prediction accuracy and explanation quality, adapting to complex feature interactions while maintaining strong performance across the full disease spectrum.Fig. 1. Overview of the COMIC architecture, illustrating the constrastive masking approach

Materials and methods

This section provides a detailed overview of the COMIC predictor’s architecture, describes the dataset used for its evaluation, and outlines the metrics employed to assess both its predictive performance and the interpretability of its explanations.

The proposed predictor

As illustrated in Fig. 1, COMIC takes a knowledge graph G as input, which is formally represented as a collection of triples, where each triple t is defined as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} t = (h, r, l) \end{aligned}$$\end{document}

where h and l denote the head and tail entities respectively, and r represents the relation between them.

The first step is embedding generation. For each entity and relation, dense vector representations are generated:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} e_h, e_l \in \mathbb {R}^{256}, e_r \in \mathbb {R}^{256} \end{aligned}$$\end{document}

These embeddings are generated through Xavier initialization and are normalized for stability [21]. The normalized entity and relation embeddings then flow into a multi-channel architecture which has two branches: the feature masking branch and the path masking branch.

Feature masking branch

This branch first processes each input triple through the Triple Encoding layer and its output is fed into the Feature Masking layer.

The triple encoding layer first concatenates the head, relation, and tail embeddings to preserve their sequential structure. Positional encoding is then applied to enhance this sequence with ordering information. The position-enhanced sequence is passed through a self-attention layer, enabling the triple embedding to maintain the sequence of the triple elements. The attended representation undergoes mean and max pooling, followed by concatenation and compression to produce a 256-dimensional triple representation:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {TripleEnc}(t) = \text {Compress}\Big (\text {Pool}\Big (\text {Attention}\big (\text {PE}(X_t)\big )\Big )\Big ) \end{aligned}$$\end{document}

Here, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_t$$\end{document}$ represents the triple input which is comprised of the head, tail, and relation embeddings, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \text {PE} $$\end{document}$ denotes positional encoding.

The output of the triple encoding layer is input to a contrastive masking layer that identifies and preserves the most relevant features. The inner working of this layer is described in Sect. 3.3.

Path masking branch

In parallel with triple processing, we samples paths between the head and tail entities (path sampling is detailed in Sect. 4.3).

Each sampled path is processed by a Path Encoding layer, which embeds each triple in the path using a position-aware neural network. The encoder applies self-attention across all triplets in the path and compresses the attended representation to 256 dimensions. The position-aware encoding ensures the path embedding maintains the sequential nature of the path:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {PathEnc}(p) = \text {Compress}\Big (\text {Attention}\big (\text {PE}(X_p)\big )\Big ) \end{aligned}$$\end{document}

Here, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_p$$\end{document}$ represent the sequence of triplets in the path, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \text {PE} $$\end{document}$ denotes positional encoding.

The output of the path encoding layer is input to a contrastive masking layer that identifies and preserves the most relevant biological pathways. The inner working of this layer is described in Sect. 3.3.

The contrastive masking layer

The Contrastive Masking layer is inspired by the information bottleneck principle of learning compressed, relevant representations. This layer serves as a learnable filter that extract the most relevant features from its input for prediction.

For a given input representation X (either TripleEnc( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_t$$\end{document}$ ) or PathEnc( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_p$$\end{document}$ )), the masking layer produces both a filtered representation Z and a selection mask M through:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & M = \sigma \Big (\frac{g_\theta (X) - \tau }{T}\Big ) \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & Z = X \odot M + 0.1X \end{aligned}$$\end{document}

where:

$\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_\theta $$\end{document}$ is a multi-layer neural network
$\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document}$ is a learnable threshold
T is a temperature parameter
$\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\odot $$\end{document}$ represents element-wise multiplication The selection mask M controls which features are retained, while the residual term 0.1X ensures gradient flow. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g_\theta $$\end{document}$ is a 3-layer neural network with layer normalization and GELU activation [25].

The training objective is to learn a filtered representation Z that preserves only the information relevant for prediction from the original representation X while discarding noise. We accomplish this through contrastive learning using the InfoNCE estimator [64]. The key idea is to minimize the mutual information between input X and filtered representation Z (compression), while maximizing the mutual information between Z and target label Y. To quantify these information relationships, the InfoNCE estimator estimates mutual information between two representations, X and Z, through contrastive learning:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} I_{\text {NCE}}(X;Z) = \mathbb {E}\left[ \frac{1}{N} \sum _{i=1}^N \log \frac{f_\theta (x_i, z_i)}{\frac{1}{N}\sum _{j=1}^N f_\theta (x_i, z_j)}\right] \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_\theta $$\end{document}$ is a neural critic function that scores the similarity between the input X and its filtered representation Z, and N is the batch size.

The masking layer processes the encoded representations to produce:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & (Z_{t}, M_{t}) = \text {ContrastiveMask}(\text {TripleEnc}(X_t)) \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & (Z_{p}, M_{p}) = \text {ContrastiveMask}(\text {PathEnc}(X_p)) \end{aligned}$$\end{document}

These filtered representations and their corresponding masks enable both prediction and interpretation of the predictor’s decisions. The masks $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{t}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{p}$$\end{document}$ highlight which features and paths were most relevant for the prediction, while $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_{t}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_{p}$$\end{document}$ contain the information relevant for prediction, which is used for optimizing the training objective.

The contrastive learning objectives are computed as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & \mathcal {L}_{\text {triple}} = -I_{\text {NCE}}(Z_t;Y) + \beta I_{\text {NCE}}(X_t;Z_t) \end{aligned}$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & \mathcal {L}_{\text {path}} = -I_{\text {NCE}}(Z_p;Y) + \beta I_{\text {NCE}}(X_p;Z_p) \end{aligned}$$\end{document}

Where Y are the target labels and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}$ is a hyperparameter that controls the balance between the two objectives of the contrastive masking layer. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}$ determines the trade-off between maximizing relevant selected information (negative term) and minimizing redundant information (positive term).

These losses guide the Contrastive Masking layer to learn optimal feature masks, balancing between retaining relevant information for prediction while discarding noise.

Interaction prediction and loss computation

The filtered triple representations from the feature masking layer ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_{t}$$\end{document}$ ) are passed through a binary classification neural network. The prediction is computed as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \hat{Y} = f_{\text {classify}}(Z_{\text {t}}) \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_{\text {classify}}$$\end{document}$ is the classification network (a 3-layer GELU activated neural network).

The Binary Cross Entropy loss between the true labels Y and predicted outputs $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{Y}$$\end{document}$ is calculated as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mathcal {L}_{\text {BCE}} = -\frac{1}{N}\sum _i [Y_i\log (\hat{Y}_i) + (1-Y_i)\log (1-\hat{Y}_i)] \end{aligned}$$\end{document}

The total loss combines the prediction loss with the contrastive learning objectives:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \mathcal {L}_{\text {total}} = \mathcal {L}_{\text {BCE}} + \beta _2(\mathcal {L}_{\text {triple}} + \mathcal {L}_{\text {path}}) \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}$ is a weighting factor that determines the contribution of the contrastive masking loss to the overall training loss. A smaller $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}$ helps balance predictive power optimization with the more complex contrastive masking objective, as the prediction loss converges faster than the contrastive masking loss.

Dataset

Drug repurposing—the process of identifying new therapeutic uses for existing drugs—requires comprehensive understanding of biological interactions across multiple scales, from molecular mechanisms to clinical outcomes. While several biomedical knowledge graphs exist, including SPOKE [42], GARD [71], Hetionet [26], DRKG [70], BioKG [66], and PrimeKG [13], we evaluate COMIC on PrimeKG due to its unique characteristics that facilitate in-depth understanding of drug-disease relationships.

PrimeKG demonstrates higher disease coverage compared to existing biomedical knowledge graphs. With 22,236 disease terms consolidated into 17,080 clinically meaningful conditions, PrimeKG offers unprecedented disease coverage that is 1–2 orders of magnitude greater than previous datasets. PrimeKG contains 17,080 diseases compared to much smaller coverage in other graphs: Hetionet has only 137 diseases (over 100 times fewer), DRKG has 5103 diseases (3 times fewer), and GARD focuses on 6323 rare diseases only. Even SPOKE, which is a very large knowledge graph with 27 million nodes, contains only 10,932 diseases, making PrimeKG’s disease coverage nearly twice as comprehensive. This extensive disease representation is crucial for identifying novel drug applications across a wide spectrum of conditions.

Moreover, PrimeKG features 30 distinct relation types specifically designed for drug-related information, including critical relationships for repurposing such as indication, contraindication, and off-label use. These specialized drug-specific relationships are largely absent in other graphs, which typically use non-descriptive relation types of the form "Drug-Disease" without specifying the nature of the relationship. The drug-specific relations provide complementary signals that are essential for comprehensive drug repurposing: indication relationships serve as positive examples of validated therapeutic applications, contraindications highlight important safety boundaries and potential adverse effects, and off-label use cases offer valuable hints about promising therapeutic directions. Second, the authors of PrimeKG employed advanced AI methods to group similar diseases together into clinically meaningful categories instead of keeping them separate like other graphs do [13]. For example, "Type 1 diabetes" and "Insulin-dependent diabetes mellitus" are merged into a single disease "Type 1 diabetes mellitus" in PrimeKG instead of treating them as separate entities. This consolidation removes redundancy and noise, making representation learning more effective.

Furthermore, despite its comprehensive disease coverage, PrimeKG is ideally sized for AI applications. It provides rich learning data without the computational burden and noise of massive graphs, and remains comprehensive enough to avoid the limitations of smaller datasets. The integration of data from 20 authoritative sources, including DrugBank [33], DisGeNET [50], CTD [15], Gene Ontology [61], Human Phenotype Ontology [20], and Reactome [16], provides PrimeKG with the multi-scale biological context necessary for explainable drug repurposing [13]. This integration results in a graph structure comprising 129,375 nodes across 10 biologically relevant types, connected by 4,050,249 unique relationships, capturing the complex web of biological interactions relevant to drug repurposing, from molecular pathways to clinical manifestations.

Additionally, with 90.8% of Orphanet’s 9348 rare diseases represented, PrimeKG enables drug repurposing for rare diseases–a particularly valuable application given the limited treatment options for these conditions [13]. These characteristics make PrimeKG an ideal foundation for developing explainable AI for drug repurposing, capable of both identifying novel therapeutic applications and providing mechanistic explanations for their predictions.

Evaluation metrics

To comprehensively evaluate COMIC, we employ two distinct sets of evaluation strategies: one for assessing prediction performance and another for interpretability assessment. For prediction performance, we follow standard practices from the biomedical literature [4, 5, 29, 40, 44, 49], using Area Under the Precision-Recall Curve (AUPRC) as the primary metric, complemented by accuracy for detailed performance analysis.

The prediction evaluation metrics are summarized in Equation 15:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} f(x)_{eval} = {\left\{ \begin{array}{ll} \text {Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \\ \text {AUPRC} = \int _{0}^{1} P(r) dr \end{array}\right. } \end{aligned}$$\end{document}

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives respectively. TPR denotes the true positive rate, FPR the false positive rate, and P(r) the precision at recall value r. These metrics collectively provide a robust assessment of our predictor’s ability to correctly predict drug-disease associations.

Following the evaluation criteria of explainable AI (XAI) literature [11], we assess graph-based explanations using insertion and deletion faithfulness tests, which evaluate the importance of the graph’s edges in the prediction process. Additionally, we assess biological plausibility by comparing the highlighted pathways with established mechanisms of action from medical literature.

We evaluate graph interpretability using the following two metrics: a. Insertion: we start from an empty graph and gradually insert top K% edges to the graph where the importance is measured by COMIC’s interpretable connections. We measure the updated AUPRC on the test set using different values of K. A better XAI method has a higher AUPRC at each threshold since it suggests that the newly added edges are informative to the graph predictions ( also referred to as fidelity+). b. Deletion: we start from the full graph and gradually delete top K% edges to the graph where the importance is measured by COMIC’s interpretable connections. We measure the updated AUPRC on the test set using different values of K. A better XAI method has a lower AUPRC at each threshold since it suggests that the deleted edges are necessary for the graph predictions (referred to as fidelity-).

To evaluate the biological plausibility of the pathways highlighted by COMIC, we turn to medical literature and seek pathways that incorporate the indications’ mechanism of action (MOA). Specifically, we conduct a literature review to identify established biological pathways associated with each drug’s therapeutic indications and compare these with our model’s highlighted pathways.

Experimental setup

This section covers the proposed predictor’s experimental and implementation details. COMIC is implemented in PyTorch 2.0 for GPU-accelerated computation, with Numpy for handling vectorized operations during preprocessing and inference phases [47, 24]. The web application is implemented in Django with a PostgreSQL backend.

Data splits

We utilize the data split methodology established by Huang et al., which splits PrimeKG into nine disease areas [29]. Instead of using a typical percentage-based split across all data, these splits create "disease area splits" to better simulate real-world drug repurposing challenges where new diseases have limited associated data.

The splitting process works as follows for each disease area:

Test set: For a chosen disease area, all known drug indications and contraindications related to diseases in that area are completely removed from the main dataset. These removed relationships then form the test set for that specific disease area. This approach simulates scenarios where we attempt to find new treatments for diseases that currently have few or no known drugs.

Validation and training set: From the remaining edges after test set removal, 12.5% are randomly sampled for the validation set, with sampling done independently for each relation type to ensure representativeness. The remaining approximately 87.5% of edges form the training set.

Table 1 shows the statistics for each of the nine disease areas, detailing the number of diseases, indications, and contraindications in each area.Table 1. Statistics for the nine disease area splits showing the number of diseases, indications, and contraindications in each areaDisease areaDiseasesIndications****ContraindicationsCell proliferation1838541007Mental health562131038Cardiovascular1043003131Anemia1555545Adrenal gland633303Autoimmune1875319Metabolic5468523Diabetes3102364Neurodegenerative16123134

Each test set is carefully curated following three strict criteria: complete exclusion of approved drugs from the training data, minimal disease similarity with the training set, and removal of biological neighbor information to limit molecular data availability [29]. Additionally, a fraction (5%) of the connections between biomedical entities to these diseases were removed from PrimeKG for stringent testing [29].

We also compare on a random split where Huang et al. used a traditional approach with 95% for training and 5% for testing. From the training set, 12.5% is used for validation and the remaining is used for training as per their settings.

The splits encompass major therapeutic domains including adrenal gland disorders, autoimmune diseases, neurodegenerative conditions, metabolic disorders, and cardiovascular diseases, with additional splits covering anemia, diabetes, cancer, and mental health. These splits were generated using the exact same methodology as Huang et al., utilizing their open-source codebase. This systematic partitioning ensures COMIC is tested under realistic drug repurposing conditions, where the predictor must generalize to diseases with limited prior knowledge.

Negative sampling strategies

Negative sampling is a pivotal technique in Graph based Learning, essential for training predictors to effectively distinguish between valid and invalid relationships. The quality of negative samples significantly influences the performance of predictors, as they provide the necessary contrast to positive samples during training. However, generating high-quality negative samples poses challenges, particularly in ensuring that these samples are informative and contribute to robust learning. [41]. Various negative sampling strategies have been proposed in the literature to address these challenges. Random negative sampling is the most straightforward approach, where entities in triples are randomly replaced to create negative samples. While efficient, this method can produce trivial negatives that may not effectively challenge the predictor [41]. Model-guided and knowledge-constrained strategies leverage pre-trained models or external knowledge to generate semantically meaningful negatives, albeit with increased computational complexity [41]. Moreover, probabilistic negative sampling adjusts the sampling probability based on entity frequency, focusing on harder negatives.

In our work, we implement two distinct negative sampling strategies:

Degree-based hard negative sampling: This approach generates negative samples by replacing either the head or tail entity in a triple, with the replacement probability proportional to the node degrees. Higher-degree nodes have a higher likelihood of being selected as replacements, resulting in more challenging negative samples that can enhance training by focusing on harder-to-predict cases.
Random sampling with fixed destination nodes: In this strategy, negative samples are created by maintaining the destination nodes while randomly selecting source nodes from the set of valid entities. To ensure diversity and robustness in learning, we adopt an epoch-wise regeneration strategy, where new random negative samples with fixed destinations are generated at the beginning of each training epoch.

Path sampling approach

In large-scale knowledge graphs, comprehensive path finding between entities is often impractical due to computational constraints [8]. To address this, various path sampling strategies have been developed. One common approach is the use of meta-paths [68], which are predefined sequences of relation types that capture specific semantic relationships within heterogeneous graphs. This method leverages domain knowledge to guide the sampling process, focusing on meaningful and interpretable connections [38].

Alternatively, random walk-based sampling offers a more flexible approach by exploring the graph without any constraints. This allows for the discovery of novel and potentially unexpected connections, as the predictor learns path patterns directly from the data.

In our study, we implement a random walk sampling strategy to explore the knowledge graph:

Random walk sampling: For each drug-disease pair, we perform random walks to sample up to 5 distinct paths, each limited to a maximum length of 5 relations. Then, 5 false paths are created by randomly corrupting the last two nodes from each path.
Diversity threshold: To ensure the quality of extracted evidence, we implement a diversity threshold that prevents the collection of redundant paths. Specifically, a new path is accepted only if its overlap with previously sampled paths falls below 50%, compelling the sampler to identify varying patterns between each drug-disease pair.

Training procedure

We train the COMIC predictor using the Adam optimizer [32] with a learning rate of 0.0005. The predictor is trained for a maximum of 30 epochs with a batch size of 1024. For the contrastive masking layer, we set the temperature parameter to start at 1.0 and anneal it linearly throughout training to a minimum of 0.1, while $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}$ is fixed at 0.1. The weighting factor $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _2$$\end{document}$ is set to 0.2 to balance the contribution of the contrastive masking loss with the prediction loss. We employ early stopping with a patience of 2 epochs on the validation loss to prevent overfitting and select the best performing weights. All hyperparameters are tuned with the validation set, and we use gradient clipping with a maximum norm of 1.0 to ensure stable training.

Results

This section illustrates the COMIC predictor’s performance under four distinct experimental settings. (1) A comprehensive comparison of COMIC against eight baseline methods and TxGNN using the established disease area splits. (2) Validation of COMIC’s real-world applicability by testing its predictions on recent FDA approvals that occurred after PrimeKG’s construction. (3) Assessment of COMIC’s interpretability through insertion and deletion experiments that demonstrate the predictor’s ability to identify biologically meaningful relationships. (4) Examination of the biological credibility of COMIC’s explanatory pathways through case studies of recently approved drug-disease pairs.Table 2. Performance in predicting drug indications. Reported is AUPRCCategoryKLJSProximityDSDRGCNHANHGTBioBERTTxGNNCOMICRandom0.500.490.530.490.840.870.720.810.910.9375Cell proliferation0.510.500.470.510.470.650.610.830.920.9031Mental health0.500.500.440.500.550.610.560.640.890.9046cardiovascular0.490.500.500.500.410.630.600.660.640.8817Anemia0.470.530.490.510.420.670.530.640.960.9302Adrenal gland0.480.500.480.500.380.590.540.640.990.9375Autoimmune0.510.490.560.520.510.530.490.660.870.9284Metabolic disorder0.510.540.410.510.670.560.540.680.760.9362Diabetes0.500.520.470.520.400.670.620.690.870.9114Neurodegenerative0.500.510.490.510.840.710.600.570.950.9182Table 3Performance in predicting drug contraindications. Reported is AUPRCCategoryKLJSProximityDSDRGCNHANHGTBioBERTTxGNNCOMICRandom0.490.500.560.490.770.840.630.580.820.9646Cell proliferation0.490.490.500.500.540.510.470.780.870.9233Mental health0.500.500.500.500.670.440.560.540.790.9267cardiovascular0.490.490.510.500.630.470.510.560.720.8836Anemia0.500.500.500.500.650.540.510.530.740.9497Adrenal gland0.490.500.480.490.750.460.550.470.880.9502Autoimmune0.500.490.510.500.680.400.610.400.860.9486Metabolic disorder0.490.510.470.510.580.560.510.530.780.9525Diabetes0.500.510.480.510.570.560.520.510.690.9564Neurodegenerative0.510.510.560.510.710.420.550.600.780.9560

Predictive performance

COMIC’s predictive performance is compared against eight baseline methods and TxGNN using the disease area splits and results reported by [29] (described in Section 4.2.1). The baseline methods span network statistical techniques (Kullback–Leibler divergence, Jensen–Shannon divergence), graph-theoretical approaches (network proximity, diffusion state distance), modern GNN architectures (RGCN, HGT, HAN), and language models (BioBERT) [23, 22, 9, 72, 57, 28, 67, 36].

To ensure fair and robust experimental comparisons, our reported performance results are averaged across 5 random seeds (1–5), which are identical to those used by Huang et al. for their TxGNN results (described in Sect. 4.1). Huang et al.’s codebase was utilized to generate the train/validation/test splits for each seed without any modification, ensuring identical data partitions across all comparisons. The running environment was configured identically to theirs, with consistent package and library versions to ensure reproducibility. For the eight baseline methods, we use the performance values reported by Huang et al., which were generated under the same experimental conditions.

To ensure robust comparison, we first conducted an analysis of negative sampling strategies. While Huang et al. [29] employed both random and degree-based sampling in their work, they did not specify which approach was used for their final reported results. Given this ambiguity, we evaluated both approaches through validation on recent FDA-approved drugs. We found that degree-based hard negative sampling outperformed random generation by an average margin of 10.2% in ranking correct indications higher. This superior performance better reflects real-world drug repurposing scenarios by forcing the predictor to correctly distinguish between therapeutically similar but distinct drug-disease pairs. More importantly, by choosing degree-based hard negative sampling for our comparative analysis, we ensure our reported performance metrics represent conservative estimates relative to potentially simpler random sampling approaches. The following results use degree-based hard negative sampling and are averaged across five random seeds (1–5).

COMIC demonstrated consistently high performance across all nine disease areas for both indication and contraindication prediction tasks (Table 2 and 3). For indications, COMIC achieved AUPRC scores ranging from 0.88 to 0.94, surpassing the state-of-the-art (Table 2). Notably, while the state-of-the-art [29] showed strong performance in specific areas (AUPRC 0.95 for adrenal gland, anemia, and neurodegenerative diseases), COMIC maintained more consistent performance across all disease categories, with an average improvement of 4.29% over TxGNN. This consistency is particularly evident in challenging areas like cardiovascular diseases, where COMIC achieved an AUPRC of 0.88 compared to TxGNN’s 0.64, representing a 37.8% improvement. For contraindications, COMIC outperformed existing methods across all disease areas, with AUPRC scores ranging from 0.88 to 0.96 (Table 3). The predictor achieved particularly strong results in diabetes (AUPRC: 0.95) and metabolic disorders (AUPRC: 0.95), areas where accurate contraindication prediction is crucial for patient safety. The performance gap was again most pronounced in cardiovascular diseases, where COMIC maintained robust performance (AUPRC: 0.88) while other methods showed significant degradation. Again, COMIC maintained more consistent performance across all disease categories, with an average improvement of 14.82% over TxGNN. COMIC overall achieved an average gain of 9.55% across both interaction types. These results establish COMIC’s predictive power in a test setting.Table 4COMIC’s performance on recent FDA-approved repurposed indications. The percentile indicates how high the drug was ranked relative to all possible indications for the diseaseDrugDiseaseCOMICPercentilePredictionRareTxGNNUstekinumabPsoriatic arthritis0.93797.147IndicationNo0.865AfliberceptDiabetic macular edema0.94796.318IndicationNo0.989TirzepatideType 2 diabetes mellitus0.92695.513IndicationNo0.286HyaluronidaseMultiple sclerosis0.90295.048IndicationNo–AfliberceptMacular degeneration0.94594.722IndicationNo0.858AfliberceptDiabetic retinopathy0.93892.082IndicationNo0.963OcrelizumabMultiple sclerosis0.8991.441IndicationNo–XanomelineSchizophrenia0.86690.851IndicationNo–UstekinumabUlcerative colitis (disease)0.90590.637IndicationNo0.698BortezomibMantle cell lymphoma0.91490.084IndicationYes0.942NivolumabColorectal cancer0.90786.113IndicationNo0.017NivolumabMelanoma0.87674.915IndicationNo–TocilizumabCOVID-190.6274.324IndicationNo–ArimoclomolNiemann-Pick disease0.58366.344IndicationYes–AtezolizumabHepatocellular carcinoma0.36863.592No indicationNo0.005NivolumabHead and neck cancer0.5663.252IndicationNo0.077GanaxoloneCDKL5 disorder0.50561.958IndicationYes0.335TezacaftorCystic fibrosis0.85358.854IndicationNo0.999MaribavirCytomegalovirus infection0.61757.409IndicationYes0.033AtezolizumabMelanoma0.83154.38IndicationNo0.197PacritinibMyelofibrosis0.2951.125No indicationYes0.011ConcizumabHemophilia0.76145.381IndicationYes–TezepelumabAsthma0.1536.823No indicationNo0.233AtezolizumabAlveolar soft part sarcoma (disease)0.74227.536IndicationYes0.059LebrikizumabAtopic dermatitis0.07824.821No indicationNo–InclisiranFamilial hypercholesterolemia0.06124.18No indicationYes0.301NivolumabGastric cancer0.07313.673No indicationNo–NivolumabRenal cell carcinoma (disease)0.0148.496No indicationYes–NivolumabUrothelial carcinoma0.0051.873No indicationYes–LecanemabAlzheimer disease0.0151.797No indicationNo–

Evaluation of COMIC on recent FDA-approvals

The next step is to validate COMIC’s real-world applicability. For this, we test the predictor on 30 repurposed drug-disease pairs that received FDA approval between 2022–2025. These approvals occurred after the knowledge graph’s construction, ensuring that no relationships between these drug-disease pairs existed in the training data. Highlighted in Table 4 this evaluation provides a stringent test of the predictor’s ability to identify drug repurposing candidates.

Despite achieving exceptional AUPRC scores on the test set, COMIC’s performance on completely novel FDA approvals revealed important real-world challenges. The predictor successfully predicted 21 out of 30 (70%) approved indications, with 10 of these predictions ranking above the 90th percentile (Table 4). This notable performance gap between testing and prospective evaluation underscores the inherent difficulty of predicting therapeutic applications, particularly when mechanistic understanding is still evolving for most diseases.

COMIC still demonstrated strong predictive ability for several therapeutic areas. For autoimmune conditions, Ustekinumab for Psoriatic Arthritis ranked in the 97th percentile, while for metabolic disorders, Tirzepatide for Type 2 Diabetes ranked in the 95th percentile. Aflibercept showed consistently high rankings across multiple indications: diabetic macular edema (96th percentile), macular degeneration (94th percentile), and diabetic retinopathy (92nd percentile).

To further investigate the gap in performance, we compared COMIC’s performance against TxGNN using the TxGNNExplorer (txgnn.org) [29], which provided TxGNN scores for 18 out of the 30 drug-disease pairs as shown in Table 4. The comparison reveals COMIC’s superior discrimination ability (scoring higher for 12 out of 18 pairs), suggesting our predictor captures complex molecular interactions better in the context of these pairs.

Where COMIC struggled most was with certain cancers and novel therapeutic mechanisms. For instance, Nivolumab received varied predictions across different cancer types, ranging from high confidence for Colorectal Cancer (86th percentile) to low confidence for Urothelial Carcinoma (1.8th percentile) and Renal Cell Carcinoma (8.5th percentile). Similarly, recent Alzheimer’s disease therapy Lecanemab received a low ranking (1.8th percentile), highlighting challenges in neurodegenerative diseases where their mechanisms are still being understood [19].

To understand the structural factors underlying poor predictions, we performed in-depth structural diagnostic analysis on the 10 lowest-performing drug-disease pairs from Table 4. Our analysis reveals that for each low score, one of two distinct patterns emerges that explain the prediction difficulties.

The first pattern involves drug connection congestion, where drug connections are overwhelmed by drug-drug similarities, masking their information and resulting in ineffective learning. Drugs like Nivolumab showcase this pattern, where sophisticated drugs are reduced to simplistic similarity-based representations that obscure their mechanisms of action. Nivolumab has 678 total connections, but 668 (98.5%) are drug-drug similarities, with only 8 indication relationships and 2 drug-protein interactions. This extreme imbalance means the model learns the representation for Nivolumab primarily through drug-drug relationships rather than its critical mechanisms, which explains poor performance on cancer indications where mechanism-based reasoning is essential. Similarly, Lecanemab has 660 connections with 100% drug-drug similarities and zero mechanistic relationships, while Lebrikizumab has 660 connections with 658 (99.7%) being drug-drug similarities and only 2 protein interactions. Across the set of lowest scoring congested drugs, we observe an average 95.8% drug-drug interaction dominance with mechanistic relationships barely represented. This drug-drug interaction based learning limits the model’s ability to understand their therapeutic potential.

The second pattern involves disease information sparsity, where low performing diseases have severely limited connections, and existing connections lack biological relevance to the disease’s physiology. For example, atopic dermatitis has only 4 direct connections with minimal inflammatory pathway representation. Sample connections include irrelevant terms like "Clumsiness" and "melanocyte-stimulating hormone receptor activity." Urothelial carcinoma has 16 connections that are mostly anatomical rather than pathological, with examples like "lumbar dorsal root ganglion," "laminin-6 complex," and "renal pelvis/ureter carcinoma." Myelofibrosis suffers from similar issues with 14 connections that are mostly anatomical rather than mechanistic, including "GINS complex" and "left ventricular trabecular myocardium morphogenesis." This sparsity pattern results in limited pathway representation, insufficient phenotype connections, absent mechanism links, and poor biological context.

This structural diagnostic suggest that COMIC’s limitations stem from knowledge graph representation issues. These findings suggest knowledge enrichment strategies including reducing drug-drug interactions dominance through relation-weighted sampling, enhancing disease pathway representations through literature mining, and developing quality metrics for balanced knowledge representation. These changes can significantly improve computational drug repositioning for the diseases that need it most. These results reveal both COMIC’s strengths and limitations in real-world drug repurposing scenarios. While the 70% prediction rate demonstrates substantial practical utility, the performance differential between the test set and recent approvals highlights opportunities for further refinement.Fig. 2. Insertion performance evaluationFig. 3Deletion performance evaluation

Evaluation of COMIC’s interpretability

To assess COMIC’s interpretability, we conduct insertion and deletion evaluation described in Sect. 3.3. In the insertion experiment, as shown in Fig. 3, the AUPRC steadily increased—from 0.67 at 10% insertion up to 0.99 when all edges were inserted—demonstrating that the edges identified as influential indeed contribute substantially to COMIC’s predictive performance. Conversely, the deletion test started with the full graph and gradually removed edges ranked by their importance. As shown in Fig. 2, the resulting deletion curve reveals a marked decrease in AUPRC at each step (from approximately 0.99 with no removal to 0.62 at 90% deletion), highlighting that even modest removals of highly ranked edges significantly impair COMIC’s outputs.

The initial steep decrease in deletion and the initial steep increase in insertion conform to the ideal in these evaluation metrics, confirming that COMIC’s interpretable connections are faithful to the predictor’s underlying decision-making process. Together, these trends indicate that the most highly-ranked edges are truly critical for accurate predictions as indicated by the rapid initial degradation during deletion and high initial gains during insertion. This demonstrates that COMIC’s interpretability is transparent and robust in identifying the edges most crucial to the predictor’s decision process.

Mechanistic pathways validation

We now examine the biological credibility of COMIC’s explanatory pathways. We consider the indication pairs in Table 4 for stringent testing since they were approved after PrimeKG’s construction.

In the case of Ocrelizumab for Multiple Sclerosis, COMIC identified several pathways that directly link its target, MS4A1 (CD20), to key immune mediators [18]. Notably, several valid pathways traverse through molecules such as HLA-DRA, CD40, NF-KB, and IL7R–elements that are well documented in the pathogenesis of multiple sclerosis and align closely with the drug’s known MOA of B cell depletion and subsequent modulation of T cell responses [2, 43, 34]. These mechanistic explanations are further supported by literature that underscores the importance of MHC class II interactions and costimulatory signals in mediating autoimmunity [30]. Similarly, for Atezolizumab in Melanoma, COMIC revealed pathways that directly connect CD274 (PD-L1) to known melanoma drivers [59]. In particular, pathways such as the MTOR–AKT1 cascade reflect established oncogenic processes in melanoma [48]. Additionally, a link through SLC39A11 to PMEL (gp100) directly ties to a melanocyte differentiation antigen and a key target in melanoma immunotherapy [7]. These paths provide mechanistically plausible explanations that resonate with current understanding of melanoma biology and the rationale behind immune checkpoint inhibition.

For well-connected diseases with rich mechanistic information, COMIC consistently identifies biologically meaningful pathways. However, our structural analysis reveals that the reliability and biological significance of COMIC’s pathway explanations depend heavily on the underlying connectivity patterns in the knowledge graph. When diseases suffer from sparse connectivity or drugs are dominated by similarity relationships rather than mechanistic connections, COMIC’s path sampler is forced to rely on more indirect associations. This impacts the coverage of mechanistic explanations rather than overall predictive reliability, as the model can still extract meaningful predictive signals from these indirect pathways.

For Bortezomib and Mantle Cell Lymphoma, the predictor predominantly returned paths based on drug effects. For instance, adverse effects such as dyssynergia, hyperuricemia, connect to drugs, which then eventually tie to other phenotypes like fatigue which indirectly tie to mantle cell lymphoma. Such paths do not capture the primary mechanism of action of Bortezomib, but instead appear to be driven by indirect associations present in the dataset. This pattern emerges particularly when the underlying molecular mechanisms are not well represented in the knowledge graph or when drug connectivity is dominated by drug-drug relationships that obscure mechanistic information.

For instance, in cases like Alzheimer’s disease and Lecanemab, asthma and Tezepelumab, or hemophilia and Concizumab, the predictor’s explanations were either absent or based on indirect disease associations rather than direct mechanism of action-related pathways. This occurs because drugs like Lecanemab have 660 connections where 100% of them are drug-drug interactions with zero mechanistic relationships, forcing the path sampler to traverse through less biologically meaningful associations.

One potential approach to address this limitation would be implementing meta-path constraints for sampling, where we define specific rules for graph traversal. For example, if the current node is a drug, the chosen neighbor could only be a protein, disease, or phenotype, ensuring biologically relevant path structures. However, this approach presents its own challenges, as rigid meta-path rules would prevent many node pairs from returning any paths due to structural limitations in the knowledge graph. For instance, Lecanemab’s 660 connections consist entirely of drug-drug relationships, and strict meta-path rules would fail to return any pathways.

These examples higlight that while COMIC is capable of retrieving highly plausible mechanistic paths, its explanations for indications lacking comprehensive mechanistic data tend to rely on indirect associations. The most effective solution would be the development of a balanced, highly curated knowledge graph with complete mechanistic information for each disease and drug. If PrimeKG had a balanced set of relation types and sufficient pathway representation for all diseases, COMIC could consistently identify biologically meaningful pathways while maintaining its predictive performance. This reliance on indirect associations when insufficient mechanistic evidence exists emphasizes the need for further enrichment of the underlying knowledge graph and continued refinement of our interpretability methods.

The COMIC web application

After extensive evaluation of COMIC’s capabilities, we have deployed the predictor as a web application (https://sds-genetic-interaction-analysis.opendfki.de/drug_prediction/) that provides predictions and their explanations for drug indications and contraindications. Users can select/search diseases, configure pathway parameters, and search for specific drugs. Results are presented in a sortable table with confidence scores for each drug-disease pair. The application also visualizes the pathway explanations connecting drugs to diseases through interactive network diagrams, helping researchers understand the potential mechanisms of action. This web applications aims to accelerate drug repurposing research by making COMIC’s predictions accessible to the broader scientific community, serving as a valuable resource for drug development researchers, clinicians exploring alternative therapies, and computational biologists investigating disease mechanisms.

Conclusion

The core contribution of this work lies in the development of a novel predictor, COMIC, that enables the systematic identification of potential new indications for existing drugs, with a particular focus on diseases that are characterized by a lack of effective therapeutic options. This approach offers a significant advancement in addressing the unmet medical needs associated with these conditions. COMIC achieves a 4.29% average improvement in indication prediction over the state-of-the-art in a test setting. The superior performance of the COMIC predictor, compared to existing approaches, is driven by its innovative feature and path masking branches. The feature masking branch successfully identifies critical drug-disease interaction signatures, while the path masking branch is capable of highlighting mechanistic pathways, together providing clinicians with transparent reasoning for COMIC’s predictions. COMIC’s real-world applicability is validated through its correct identification of 70% of recently approved therapeutic indications, demonstrating considerable practical utility while maintaining end-to-end interpretability. But the evaluation on recent FDA approvals also revealed important limitations, particularly for diseases with evolving molecular understanding such as certain cancers and neurodegenerative conditions. These challenges highlight the critical need for richer knowledge graphs with more comprehensive mechanistic data, especially for rare diseases where treatment options remain limited. Future research should focus on expanding the biological pathway representations within these graphs and refining interpretable mechanisms to better capture complex therapeutic relationships. By addressing these data gaps and continuing to enhance interpretability, we can further bridge the divide between computational predictions and clinical adoption of AI-driven drug repurposing.

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Chan CS, Kong H, Guanqing L. A comparative study of faithfulness metrics for model interpretability methods. In: Muresan S, Nakov P, Villavicencio A (eds.) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 5029–5038. Association for Computational Linguistics, Dublin, Ireland. (2022) 10.18653/v 1/2022.acl-long.345 . https://aclanthology.org/2022.acl-long.345/
2Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Teh YW, Titterington M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, 9:249–256. PMLR, Chia Laguna Resort, Sardinia, Italy (2010). https://proceedings.mlr.press/v 9/glorot 10a.html
3Guney E, Menche J, Vidal M, Baraasi A-L. Network-based in silico drug efficacy screening. Nat Commun. 2016;7:1–13.10.1038/ncomms 10331 PMC 474035026831545 · doi ↗ · pubmed ↗
4Hendrycks D, Gimpel K. Gaussian error linear units (GEL Us) (2023). arxiv:1606.08415
5Himmelstein DS, Lizee A, Hessler C, Brueggeman L, Chen SL, Hadley D, Green A, Khankhanian P, Baranzini SE. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. e Life. 2017;6:26726 10.7554/e Life.2672610.7554/e Life.26726 PMC 564042528936969 · doi ↗ · pubmed ↗
6Kingma DP, Ba J. Adam: a method for stochastic optimization (2017). arxiv:1412.6980
7Knox C, Wilson M, Klinger CM, et al,. Drug Bank 6.0: the Drug Bank knowledgebase for 2024. Nucleic Acids Res 52(D 1), 1265–1275 (2024) 10.1093/nar/gkad 97610.1093/nar/gkad 976PMC 1076780437953279 · doi ↗ · pubmed ↗
8Liu A, Chen J, Du R, Wu C, Feng Y, Li T, Ma J. Heterosample: meta-path guided sampling for heterogeneous graph representation learning (2024). arxiv:2411.07022