DTIP-WINDGRU a novel drug-target interaction prediction with wind-enhanced gated recurrent unit

Kavipriya Gananathan; D. Manjula; Vijayan Sugumaran

PMC · DOI:10.1186/s12859-025-06141-0·July 20, 2025

DTIP-WINDGRU a novel drug-target interaction prediction with wind-enhanced gated recurrent unit

Kavipriya Gananathan, D. Manjula, Vijayan Sugumaran

PDF

Open Access

TL;DR

This paper introduces a new model for predicting drug-target interactions using a wind-enhanced GRU and optimization algorithm.

Contribution

The novel DTIP-WINDGRU model combines wind-driven optimization with GRU for improved drug-target interaction prediction.

Findings

01

DTIP-WINDGRU outperforms existing techniques in predicting drug-target interactions.

02

The model uses drug-drug and target-target interactions to initialize GRU weights effectively.

03

Wind Driven Optimization enhances hyperparameter selection for better performance.

Abstract

Identification of drug target interactions (DTI) is an important part of the drug discovery process. Since prediction of DTI using laboratory tests is time consuming and laborious, automated tools using computational intelligence (CI) techniques become essential. The prediction of DTI is a challenging process due to the absence of known drug-target relationship and no experimentally verified negative samples. The datasets with limited or unbalanced data, do not perform well. The models that use heterogeneous networks, non-linear fusion techniques, and heuristic similarity selection may need a lot of computational power and experience to implement and fine-tune. The latest developments in machine learning (ML) and deep learning (DL) models can be employed for effective DTI prediction process. To that end, this study develops a novel DTI Prediction model, namely, DTIP-WINDGRU Drug-Target…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals1

DTIP

Figures8

Click any figure to enlarge with its caption.

Process Model for the DTIP-WINDGRU Approach

Labeled Data Analysis of DTIP-WINDGRU model. a Enzyme dataset, b G Protein-Coupled Receptors dataset, c ION Channel dataset, d Nuclear Receptor dataset, showing labeled and unlabeled instances at various top k% values

AUC and AUPR analysis of DTIP-WINDGRU model on various CV_SEEDs values. a Enzyme dataset, b G Protein-Coupled Receptors dataset, c ION Channel dataset, d Nuclear Receptor dataset, displaying AUC and AUPR performance across five cross-validation sets

AUPR analysis of DTIP-WINDGRU model on Enzyme and GPCR datasets

AUPR analysis of DTIP-WINDGRU model on ION channel and nuclear receptor datasets

Ablation results of AUPR analysis with and without WDO in all four datasetsFig. 8Ablation results of AUC analysis with and without WDO in all four datasets

Funding1

—Vellore Institute of Technology, Chennai

Keywords

Drug-target interactionPrediction modelDeep learningGated recurrent unitWind driven optimization

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Receptor Mechanisms and Signaling · Protein Structure and Dynamics

Full text

Introduction

In recent years, the prediction of drug-target binding affinity (DTA) has become increasingly pivotal in pharmaceutical research, facilitating the discovery and development of new therapeutic agents. Therefore, various computational methodologies have been developed to address this challenge, ranging from classical to advanced deep learning approaches.

The classical methods often relied on molecular docking simulations and quantitative structure–activity relationship (QSAR) models to predict DTA. These methods typically consider structural and chemical properties of both drugs and target proteins to estimate binding affinities. In contrast, novel approaches harness the power of machine learning and artificial intelligence, particularly deep learning. For instance, DeepDTA introduced by Öztürk et al. leverages deep neural networks to predict binding affinities by learning from large-scale datasets [1], while,WideDTA, proposed by Öztürk et al., expands this by utilizing wide and deep learning architectures to capture both local and global interactions [2].

Graph neural networks (GNNs) have also gained prominence in DTA prediction, exemplified by GraphDTA, which integrates molecular graphs to model complex interactions between drugs and targets [3]. MSGNN-DTA further enhances predictive accuracy by fusing multi-scale topological features using GNNs [4]. He et al., integrated sequence and graph information, which further improved prediction capabilities [5]. Moreover, advancements like BindingSite-AugmentedDTA [6] and NHGNN-DTA highlight the trend towards interpretable models that elucidate the molecular basis of binding affinity predictions, crucial for drug repurposing and personalized medicine [7].

It is important to note that predicting those interactions is a pre-requisite for other processes namely drug-drug interactions prediction, drug repurposing, and side effect prediction [8]. The procedure of DTI prediction takes into account the various effects of the drug on protein named targets [9]. In this context, there are some popular DTIs that have a greater effect on accuracy. The process of drug development entails disease analysis, followed by discovery and development, preclinical research, clinical research, FDA reviews, and FDA post market safety monitoring. This procedure is also time-consuming, expensive, and laborious. Drug approval takes time since it needs to undergo rigorous laboratory testing. However, years of testing, may come up short since the medicine may only have minor side effects and not have a major impact on a person's health or well-being. In order to effectively treat an illness, a newly produced medication must be cost-effective, easy to get, and suitable for every individual. However, many medications never reach the market because they are too expensive or may have too many negative effects. Due to the fact that wet lab procedures utilized to make medications for novel diseases take time and may fail in the final clinical trial stage, innovative approaches are now being tried.

Discovering DTI in the lab is expensive and time consuming, and also, the presence of human error is an open challenge in this process. However, computation based DTI prediction is a reliable, inexpensive, viable alternate to blinded laboratory method [10]. It is essential to note that prediction of DTI becomes a link prediction challenge. As a starting point, prediction of DTI faces the problem of a heterogeneous network that could not be resolved by the link prediction method. The network might have two data structures: single-relational involving single type of link or multi-relational including multiple types of links. The computational DTI prediction method is categorized into two types: network-based, and machine learning-based models. The machine learning (ML) based method combines known interaction, drug information, and target information [11] where this information is fed to the learning algorithm. Target domain information and Drug domain information is converted into feature vectors or similarity matrices and the conventional method treats the non-interaction information as negative samples that are not reasonable as non-interaction information, which might have undetected drug-target interaction [12]. Many of the existing techniques are incapable of predicting new drugs without any targets that actually limit the application [13].

Predicting DTIs is a complex task requiring integration of biological data, machine learning, and domain expertise. Feng et al. [14] proposed a deep learning model that significantly improves DTI prediction by using Position Specific Scoring Matrix (PSSM) and Legendre Moment (LM) to extract evolutionary protein features. This model combines protein features with drug molecular substructure fingerprints to create informative feature vectors for drug-target pairs. The Deep Long Short-Term Memory (DeepLSTM) networks are employed to handle sequential data, addressing temporal aspects of drug-target interactions effectively, even with small datasets, while Sparse Principal Component Analysis (SPCA) is used to fuse PubChem fingerprint data with protein evolutionary features, enhancing feature expression and potentially improving model accuracy. Öztürk et al. [1] introduced DeepDTA, a deep learning approach to predict drug-target binding affinities using a convolutional neural network to process drug and protein sequences, but its high computational costs and potential overfitting due to the large number of parameters involved is a potential drawback.

Similarly, Wang et al. [16] presented an automated approach for predicting DTIs, reducing the time and cost of traditional experimental methods. This method leverages deep learning and stacked autoencoders to extract representative features from protein sequences and drug chemical structures combining information on drug molecular structure and protein amino acid sequence, recognizing their combined impact on drug-target connections. The experimental results demonstrate superiority over alternative feature extraction algorithms, classifiers, and current methods in terms of performance. However, the drawbacks lie in the limited interpretability of deep learning models and dependency on the availability and quality of biological data.

DDR (Drug-target interaction prediction) by Olayan et al. [15] uses a heterogeneous network that incorporates diverse drug-drug and protein–protein similarities, alongside established DTIs, employing graph mining algorithms. It integrates various drug-drug and protein–protein similarities through a non-linear fusion approach, enhancing prediction accuracy, capturing complex and non-linear interactions. This method employs heuristic similarity selection in preprocessing to focus on relevant similarities, reducing noise and improving prediction quality. DDR's flexibility in handling diverse prediction tasks, parameters, datasets, and evaluation methods underscores its utility in DTI prediction. The work by Beck et al. [17] focused on repurposing approved medications for SARS-CoV-2 treatment, expediting drug development due to their prior safety evaluations. The deep learning-based model (MT-DTI) excels in capturing intricate connections between medications and viral proteins without heavy reliance on domain-specific information. It identifies pharmaceuticals interacting with SARS-CoV-2 viral proteins, contributing valuable insights into COVID-19 therapeutic options. AutoDock Vina, a traditional 3D structure-based model, highlights MT-DTI's strengths and differences, emphasizing its advantages. The study expands the search to include medications for asthma, COPD, and immunosuppression, unveiling additional therapeutic options. It ranks medication candidates based on binding affinities to viral proteins, guiding further research. However, computational demands for handling heterogeneous networks and sensitivity to hyperparameter and similarity measure choices pose challenges. Furthermore, the study's focus on 3D structure-based models limits comparisons with contemporary methods, potentially overlooking alternative approaches. The MT-DTI model may not entirely avoid domain-specific information and the limited Comparative Analysis study provides a comparison with a 3D structure-based model lacks comparison with other contemporary methods.

A variety of aspects, including sequence-based, structure-based, fingerprint-based, and evolution-based data for describing drug and target information were used by Chen et al. [19]. This method of fusing many pieces of information improves how medication and target pairings are represented. The best features were found without compromising prediction accuracy by using XGBoost for feature selection which enhances the effectiveness and interpretability of the model. A deep neural network with a layer-by-layer learning method is used in the DNN-DTIs model. This deep learning architecture is appropriate for DTI prediction problems because it can extract high-level characteristics from raw DTI data. Thus, it can be concluded that DNN-DTIs is a reliable predictor for DTIs. DNN-DTIs are well suited for a wide range of applications, including drug repositioning, drug design, and understanding drug-target interactions, all of which have major implications for drug discovery because of their precise prediction performance. The major constraint lies in the potential difficulties associated with hyperparameter tuning for achieving optimal performance. Additionally, the approach is hindered by a restricted exploration of advanced deep learning architecture.

The key benefit of the model developed by Zhang et al. [20] emphasizes the acquisition of molecular structure data for medicinal molecules, crucial for understanding DTIs. The model leverages transformer networks, known for their strong performance in natural language processing and sequence-based tasks, effectively capturing interactions and long-range dependencies within chemical structures. It integrates multilayer graph information, revealing intricate patterns unobservable with traditional methods. Additionally, the incorporation of a convolutional neural network (CNN) extracts local residue information from target sequences, enhancing prediction accuracy and the model was tested on DrugBank dataset to demonstrate superior performance compared to models relying solely on target sequence data, highlighting the significant impact of molecular structural information on DTI prediction. The model's potential for drug repositioning is evident in its ability to identify therapeutic medications for diseases like COVID-19 and Alzheimer's, accelerating the discovery of new applications for existing drugs. This can hasten the identification of fresh uses for already-available medications. However, handling extensive datasets may necessitate significant computational resources. The GraphDTA model proposed by Nguyen et al. [3] utilizes a graph neural network to predict drug-target binding affinity by capturing the complex relationships between molecular structures and biological targets. It mainly emphasizes the feature extraction techniques to improve DTI prediction accuracy, but the model’s performance also heavily relies on the quality of the graph representation, which may not effectively capture all relevant features for drugs and targets when dealing with complex chemical structures.

Wang et al. [21] use CNN for predicting DTIs that can be organized using a knowledge graph that is created from drug-target combinations utilizing current information in an organized way which is the goal of this strategy. In order, to extract complicated information from drug-target pairings, a Convolutional-Convolutional (Conv-Conv) module is used, followed by a fully connected neural network, which can improve prediction accuracy. The use of Knowledge Graph based DTI (KG-DTI) to reposition medications for Alzheimer's disease (AD) by targeting apolipoprotein E shows potential and the top-recommended medications gives experimental proof that it may be repositioned to treat AD. Zhao et al. [18] proposed method Graph Convolutional network (GCN-DTI) main benefit is the associations between Drug-Protein Pairs (DPPs) into DTI modeling. GCN enables the extraction of features from DPPs, capturing intricate linkages and network interactions. Here, numerous graph-based tasks have been successfully completed by GCNs. The capacity of the model to identify between real and fake DPP characteristics is increased when GCN and a DNN are used for prediction. GCN-DTI performs noticeably better than state-of-the-art techniques. This demonstrates how well it can identify DTIs. Additionally, there is a shortage of discussion regarding the management of noisy or incomplete information within the graph.

NHGNN-DTA, a node-adaptive hybrid graph neural network proposed by He et al. [7] predicts the binding affinity between drugs and targets. It offers enhanced interpretability and accuracy, but the selection of node features and hyperparameters, is a tedious task which can impact the overall performance further, the complexity of the hybrid graph neural network may require significant computational resources and time for training and tuning.

The challenges faced by using unbalanced data in DTI prediction was solved by Rajpura et al. [22], using a Positive Unlabeled (PU) learning strategy and a one-class support vector machine (SVM) instead of marking unknown pairings as negative. This method considers them as unlabeled, which can improve prediction accuracy through an assessment of drug features thanks to the integration of 4860 Klekota Roth fingerprints and 881 PubChem fingerprints as high-dimensional feature representations for medicines. A thorough knowledge of the suggested approach's performance in diverse circumstances is provided by the evaluation of it using multiple formulations, including known drug-known target, known drug-unknown target, unknown drug-known target, and unknown drug-unknown target settings. It is important to note that by eliminating the presumption that all unknown interactions are negative, Lan et al.’s [23] use of positive-unlabeled learning in the PUDT technique successfully overcomes the problem of dealing with unknown interactions. The integration of several biological data sources pertaining to target proteins by PUDT can result in a more comprehensive understanding of drug-target interactions. This may increase prediction accuracy by capturing a wider range of target protein characteristics. PUDT takes target protein structure into account in addition to sequence information. This technique proves the capacity to find possible drug-target interactions, which is useful for drug discovery and design, by reviewing the literature and data bases that are currently accessible while, Chen et al. [8] reviewed various machine learning approaches for DTI prediction, including Positive-Unlabeled learning strategies which struggles with noisy or incomplete data thereby resulting in inaccurate prediction.

Haddadi et al.’s [24] positive-unlabeled learning (PUL) technique broadens BLM to improve the predictability of drug-target interactions. It offers a more accurate depiction of the dataset by treating all unlabeled interactions as unlabeled. Drug development is made more productive by using computational techniques like BLM with PUL instead of labour-intensive laboratory studies to anticipate drug-target interactions. These approaches face certain constraints including sensitivity to the selection of fingerprints and feature representations with a restricted usage of advanced machine learning techniques. However, the methods depend on the availability and reliability of diverse biological data sources, with limited investigation into interactions beyond enzymes, ion channels, GPCRs, and nuclear receptors. They also rely on the assumption that unlabeled interactions are genuinely negative and face challenges with noisy or insufficient data.

The earlier studies mentioned in the literature had a number of drawbacks. Although, the models used in the literature perform well with limited datasets, it's crucial to take into account whether the findings can be applied to a wider variety of drug-target interactions. The datasets with limited or unbalanced data, could not perform well. The models that use heterogeneous networks, non-linear fusion techniques, and heuristic similarity selection may need a lot of computer power and experience to implement and fine-tune. The choice of hyperparameters and similarity measures may have an impact on how well these approaches work, and adequate parameter tweaking, take time to get the best results. The presented models have a high degree of accuracy, but there is still opportunity for development and research methods such as gated recurrent units and recurrent neural networks (RNN), which would improve DTI prediction and data mining. The predictions were impacted by noisy or insufficient data sources, in order, to overcome the lack of confirmed negative samples and increase prediction accuracy, positive-unlabeled learning is used, however its effectiveness may vary depending on the dataset's properties and parameter selections. Docking techniques favour interactions for whose structural details are known while excluding fresh DTIs with unidentified structures. In conclusion, the approaches stated in the literature does not addresses all the difficulties associated with predicting drug-target interactions, but our proposed method deals with significant difficulties in predicting DTIs.

Our proposed novel Drug-target interaction prediction with Wind-Enhanced GRU (DTIP-WINDGRU) model’s pre-processing and class labeling are the main functions of DTIP that we have suggested. Additionally, D-D and T-T interactions are used for DTI prediction as well as to initialize the weights of the GRU model. The inclusion of D-D and T-T interactions in initializing the weights of the GRU model implies an effort to capture recurring patterns and enhances the generality of the model. The inclusion of GRU indicates an effort to investigate more cutting-edge methods for DTI prediction. In order to choose the GRU model's hyperparameters optimally, the WDO technique is used. The WDO algorithm's use for hyperparameter optimization implies an effort to automatically choose complicated model parameters. As a result, researchers without extensive experience in hyperparameter tuning may find the model easier to use. The WDO technique may be used to pick hyperparameters and aid in quickly locating the best ones which paves way for the questions raised about the selection of hyperparameters. The WDO approach to optimize the model may assist reduce problems caused by noisy or inadequate data. GRU and WDO indicates an alternate method to DTI prediction that may get around some of the problems with docking approaches, even if the precise limits of docking techniques are not highlighted in the literature. In order to ensure the DTIP-WINDGRU model's predictive outcomes are accurate experimental approach is carried out on four datasets.

The remainder of this paper is organized as follows. Sect. “Methods” explains the methods proposed and Sect. “Results and discussion” provides a detailed description of the results and discussion and Sect. “Conclusion” concludes the paper.

Methods

The proposed model

The DTIP-WINDGRU model uses pre-processing procedures to get the data ready for prediction. Additionally, class labeling is involved, which entails giving known interactions labels (such as positive or negative) for unlabelled instances in the data. D-D and T-T interactions are taken into consideration during the actual prediction process using the GRU. On the account that the protein sequences are lengthy, a basic CNN is unable to adequately capture the context dependencies within them. Consequently, we enhance it by interpreting the protein sequences as time series and use the GRU to extract their features. In order to ensure that the model performs better in protein feature extraction, we employ GRU to capture the long-term dependencies in order to handle the feature extraction problem for lengthy amino acid sequences. The findings of the four models have been improved to varying degrees. A GRU, a kind of recurrent neural network (RNN), is at the centre of the model. The model's weights are initialized using these interactions, facilitating the model's ability to learn and produce accurate predictions. The model uses the WDO method in the concluding step. The GRU model's optimal hyperparameters are chosen using the model behavior-influencing settings, which is essential for obtaining reliable findings. A new method for DTI prediction combines a GRU model with D-D and T-T interactions. It is potentially beneficial to optimize hyperparameters using the WDO technique. The model's versatility in tackling DTI prediction difficulties is indicated by its ability to handle labeled and unlabeled data. Overall, the proposed DTIP-WINDGRU model provides a unique viewpoint on DTI prediction and includes innovative components to enhance precision and effectiveness in this significant field of study. The process model for the DTIP-WINDGRU approach is shown in Fig. 1.Fig. 1. Process Model for the DTIP-WINDGRU Approach

The step-by-step explanation of the model’s workflow includes:

Step 1: Data Pre-processing:

Class Labeling: We assign labels to known interactions (positive or negative) for previously unlabeled instances in the dataset.

Interaction Consideration: The D-D and T-T interactions are taken into account to enrich the data.

Step 2: Feature Extraction:

Protein Sequence Processing: The protein sequences are interpreted as time series data.

GRU: A GRU is employed to extract features, capturing long-term dependencies within the sequences.

Step 3: Model Initialization:

Weight Initialization: The model's weights are initialized using the interaction data, facilitating effective learning.

Step 4: Prediction Process:

GRU-Based Prediction: The GRU model processes the input data to predict drug-target interactions.

Step 5: Hyperparameter Optimization:

WDO Method: The WDO technique is used to optimize the GRU model’s hyperparameters, ensuring reliable and accurate predictions.

Dataset description

The performance of the proposed method is validated by utilizing the gold standard dataset. The databases DrugBank, KEGG BRITE, SuperTarget, BRENDA, and others are used to find drug-target interactions but due to bias concerns caused by the greater number of negative samples than positive data, DTI pairings only include positive interactions since negative interactions have not been empirically proven. The negative interactions must be reprocessed to return to their original state since they lack experimental support, which may result in the emergence of new positive interactions.

The gold standard dataset was originally compiled by Yamanishi et al. (2008), integrating information from multiple databases including DrugBank, BRENDA, SuperTarget, and KEGG BRITE. It provides benchmark datasets for four classes of target proteins: Enzymes, Ion Channels, G-Protein Coupled Receptors (GPCRs), and Nuclear Receptors. Each dataset consists of known positive drug-target interactions, with drugs and targets represented using chemical structure and protein sequence similarity matrices respectively. The dataset is publicly available at: http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/

Pre-processing and class labeling

In the proposed work, we have used four DTIP data sets such as Ion Channel, Enzyme, Nuclear Receptor, and GPCR. The drug chemical data is attained in DRUG AND COMPOUND Section in the KEGG LIGAND. The chemical formation resemblance amongst the components is defined by SIMCOMP which provides a scoring value based on the size of substructure with graph alignment. The target protein sequence similarity is evaluated using the Smith-Waterman technique. It is used to achieve local sequence alignment and to find similarities between two sequences (target protein amino acid sequences). The Smith-Waterman algorithm concentrates on locating the best local alignments, useful for locating homologous or functional domains in proteins by understanding the connections and probable functional similarity between various proteins. The sequence similarity amongst the target is defined by the normalized Smith–Waterman method, based on the data of amino acid order of target protein derived from KEGG GENE data base. A pair of proteins $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_{i}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_{j}$$\end{document}$ , the sequence similarity amongst them can be defined by the following equation:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Sim_{{seq\left( {A_{i} ,A_{j} } \right)}} = \frac{{SW\left( {A_{i} ,A_{j} } \right)}}{{\sqrt {SW\left( {A_{i} ,A_{i} } \right)} \sqrt {SW\left( {A_{j} ,A_{j} } \right)} }}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$SW\left( {A_{i} , A_{j} } \right)$$\end{document}$ denotes the score of Smith–Waterman method. Next, the proposed approach undergoes class labelling where the target interaction exists in the dataset are utilized for labelling the unknown instance.

The class labeling method categorizes instances in a dataset, particularly those connected to target interactions. The process of class labeling entails categorizing or labeling the data instances using target interactions. The target interactions and their weights are used to label the unlabeled instances in the DTI dataset. The unknown instances, or data points without labels indicating whether the interaction exists or not, are subjected to the class labelling procedure. This technique labels ambiguous occurrences by using the positive interactions from known target interaction information. The unknown instances are classified as positive interactions if they have similarities with known target interactions; otherwise, they are classified as negative interactions. In essence, this class labeling phase uses weights from RNN, where the algorithm learns from labeled examples to generate predictions or classifications on fresh, unlabeled data, to help categorize data instances in the dataset.

Design of GRU based predictive model

In a RNN, the NN utilizes recurrent connection in all the units. The activations of neurons are fed into weights, and each unit incorporates a time delay along with a hidden memory value. This enables the network to learn the temporal dynamics of successive information by retaining historical activations [22]. Assuming a temporal input sequence $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a^{l} = \left( {a_{1}^{l} ,{ } \ldots ,a_{T}^{l} } \right)$$\end{document}$ of length $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T$$\end{document}$ ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{t,i}^{l}$$\end{document}$ the activation of unit $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i$$\end{document}$ in hidden layer $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l$$\end{document}$ at time $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}$ ), an RNN maps to a series of hidden value $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h^{l} = \left( {h_{1}^{l} ,{ } \ldots ,h_{T}^{l} } \right)$$\end{document}$ and outputs a sequence of activation $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a^{{\left( {l + 1} \right)}} = \left( {a_{1}^{{\left( {l + 1} \right)}} ,{ } \ldots ,a_{T}^{{\left( {l + 1} \right)}} } \right)$$\end{document}$ by iterating through the following equation:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{t}^{l} = \sigma \left( {W_{xh}^{l} a_{t}^{l} + h_{t - 1}^{l} W_{hh}^{l} + b_{h}^{l} } \right)$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma$$\end{document}$ refers the nonlinear activation function, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{h}^{l}$$\end{document}$ signifies the hidden bias vector and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W$$\end{document}$ stands for the weight matrices, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{xh}^{l}$$\end{document}$ represents the input‐hidden weight matrix and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{hh}^{l}$$\end{document}$ implies the hidden‐hidden weight matrix. It represents the weights connecting the hidden layer at time t − 1 to the hidden layer at time t in the RNN equation [9]. Here, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{hh}^{l}$$\end{document}$ captures the influence of the previous hidden state on the current hidden state. It is a matrix of weights governing the recurrence in the hidden layer. It was determined in the following:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{t}^{{\left( {l + 1} \right)}} = h_{t}^{l} W_{ha}^{l} + b_{a}^{l}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{ha}^{l}$$\end{document}$ denotes the hidden activation weight matrix and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{a}^{l}$$\end{document}$ signifies the activation bias vector. It is worth highlighting the weight matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^{l}$$\end{document}$ determined for the MLP corresponds to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{xh}^{l}$$\end{document}$ matrix in Eq. (2). Figure 2 depicts the structure of GRU.Fig. 2. Framework of GRU

These kinds of networks have Turing abilities [23] and, therefore, they are suitable for learning sequences but the memory model makes learning difficult to handle real‐time sequence processing. The LSTM extended RNN with memory cell, rather than recurrent unit, for storing and resultant data, easing the learning of temporal connections on long time scale. The GRU has a basic form of LSTM that is also a type of RNN. Unlike LSTM, GRU combines forget and input gates. In consideration of a network with h hidden layers, a small‐batch input at a given time step of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}$ is denoted as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_{t} \in {\mathbb{R}}^{{n{*}d}}$$\end{document}$ , and the hidden state at the preceding time step t1 is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{t - 1} \in {\mathbb{R}}^{{n{*}h}}$$\end{document}$ . The resulting hidden layer $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h$$\end{document}$ of single GRU at existing time step $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}$ :

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_{t} = \sigma \left( {X_{t} W_{xr} + H_{t - 1} W_{hr} + b_{r} } \right)$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Z_{t} = \sigma \left( {X_{t} W_{xz} + H_{t - 1} W_{hz} + b_{z} } \right)$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\overline{H} = {\text{ tan }}h\left( {X_{t} W_{xh} + \left( {R_{t} E \odot H_{t - 1} } \right)W_{hh} + b_{h} } \right)$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{t} = \left( {1 - Z_{t} } \right) \odot H_{t - 1} + Z_{t} E \odot \tilde{H}_{t}$$\end{document}

Here, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma$$\end{document}$ represents the sigmoid activation function, defined as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma \left( x \right) = 1/1 + e^{ - x}$$\end{document}$ . The weights connecting the hidden layer and update gate are represented by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{hz}$$\end{document}$ . The weights $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{connecting input layer and reset gate are denoted by }}W_{xr} , while W_{hz} {\text{and }}W_{xr}$$\end{document}$ represents the hidden layer and reset gate, input layer and update gate. Here, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{r}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b_{z}$$\end{document}$ indicate the bias of reset and update gates, while $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{t}$$\end{document}$ points toward the candidate hiding state of existing time step denoted by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t.{ }$$\end{document}$ The symbol $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\odot$$\end{document}$ represent the matrix multiplication of two components, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Tanh$$\end{document}$ characterizes the hyperbolic tangent activation function:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{ tanh }}\left( x \right) = 1 - \frac{2}{{1 + e^{ - 2x} }}$$\end{document}

Hyperparameter optimization

The optimal adjustment of the hyperparameters involved in the GRU model, includes WDO algorithm. The WDO algorithm is derived from the concept of atmosphere driven by forces such as wind, exhibit optimization-like characteristics. It is believed that by integrating these ideas into an optimization algorithm, the WDO method provides special benefits in terms of effectiveness and scalability [24]. The potential utility of the WDO algorithm lies in its capacity to navigate complex solution spaces and identify optimal configurations [31]. The algorithm has characteristics advantageous for maximizing the selection of therapeutic targets inspired by atmospheric dynamics. It is used to improve prediction accuracy with specific biological targets, contributing to the drug discovery process by precisely identifying potential therapies. The basis of WDO approach is Newton’s second law of motion that is utilized for providing precise results for analyzing atmospheric motion in the Lagrangian [24]:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho \vec{\alpha } = \sum \vec{F}_{i} ,$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{\alpha }$$\end{document}$ represent the acceleration, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho$$\end{document}$ indicates the air density for infinitesimal air parcel, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{F}$$\end{document}$ denotes each force act on the air parcel. The air pressure establishes the relation with air parcel temperature and density, it can be shown as follows

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P = \rho RT,$$\end{document}

Now $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P$$\end{document}$ indicates the pressure, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R$$\end{document}$ denotes the universal gas constant, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T$$\end{document}$ represents the temperature. The reason of air movement is because of the integration of several forces, largely including gravitational force $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {\vec{F}_{G} } \right)$$\end{document}$ , pressure gradient force $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {\vec{F}_{PG} } \right)$$\end{document}$ , Coriolis force $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {\vec{F}_{C} } \right)$$\end{document}$ , and friction force $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {\vec{F}_{F} } \right)$$\end{document}$ . it is described in the following

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{F}_{G} = \rho \delta V\vec{g},$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{F}_{PG} = - \nabla P\delta V,$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{F}_{C} = - 2\Omega \times \vec{u},$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{F}_{F} = - \rho \alpha \vec{u},$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta V$$\end{document}$ denotes finite volume of the air, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{g}$$\end{document}$ indicates the gravitational acceleration, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nabla P$$\end{document}$ characterizes the pressure gradient, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega$$\end{document}$ denotes rotation of the Earth, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{u}$$\end{document}$ epitomizes the velocity vector of the wind, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}$ signifies the friction coefficient. The abovementioned forces are added to (10).

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho \frac{{\vartriangle \vec{u}}}{\vartriangle t} = \left( {\rho \delta V\vec{g}} \right) + \left( { - \nabla P\delta V} \right) + \left( { - 2\Omega \times \vec{u}} \right) + \left( { - \rho \alpha \vec{u}} \right),$$\end{document}

where the acceleration $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{\alpha }$$\end{document}$ in (1) is expressed as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{\alpha } = \vartriangle \vec{u}/\vartriangle t$$\end{document}$ ; for simplicity set $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vartriangle t = 1$$\end{document}$ ; for infinitesimal air parcel, set $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta V = 1$$\end{document}$ , that simplify (4) to

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho \vartriangle \vec{u} = \left( {\rho \vec{g}} \right) + \left( { - \nabla P} \right) + \left( { - 2\Omega \times \vec{u}} \right) + \left( { - \rho \alpha \vec{u}} \right).$$\end{document}

Based on (11), the density $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho$$\end{document}$ is expressed interms of the pressure;

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vartriangle \vec{u} = \vec{g} + \left( { - \nabla P\frac{RT}{{P_{cur} }}} \right) + \left( {\frac{{ - 2\Omega \times \vec{u}RT}}{{P_{cur} }}} \right) + \left( { - \alpha \vec{u}} \right) ,$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{cur}$$\end{document}$ indicates the pressure of present position. It is considered in the WDO approach [32] that position and velocity of the air parcel change at all the iterations. Therefore, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vartriangle \vec{u}$$\end{document}$ is expressed as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta \vec{u} = \vec{u}_{new} - \vec{u}_{cur}$$\end{document}$ , in which $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{u}$$\end{document}$ indicates the velocity in following iteration and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{u}$$\end{document}$ indicates the velocity at the present iteration. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{g}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nabla P$$\end{document}$ denotes vectors, they are broken down in direction and magnitude as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{g} = \left| g \right|\left( {0 - x_{cur} } \right),$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$- \nabla P = \left| {P_{opt} - P_{cur} } \right|\left( {x_{opt} - x_{cur} } \right),$$\end{document}$ $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P_{opt}$$\end{document}$ denotes the optimal pressure point, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{opt}$$\end{document}$ indicates the optimal position, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{cur}$$\end{document}$ represent the existing position; update (14) with the new equation, (14) as follows

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{u}_{new} = \left( {1 - \alpha } \right)\vec{u}_{cur} - gx_{cur} + \left( {\frac{RT}{{P_{cur} }}\left| {P_{opt} - P_{cur} } \right|\left( {x_{opt} - x_{cur} } \right)} \right) + \left( {\frac{{ - 2\Omega \times \vec{u}RT}}{{P_{cur} }}} \right).$$\end{document}

At last, there are three further substitutions required. The actual pressure value is substituted with rank amongst each air parcel according to the pressure value, the resultant formula of updating the velocity is defined in the following [19]:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{u}_{new} = \left( {1 - \alpha } \right)\vec{u}_{cur} - gx_{cur} + \left( {RT\left| {1 - \frac{1}{i}} \right|\left( {x_{opt} - x_{cur} } \right)} \right) + \left( {\frac{{c\vec{u}_{cur}^{other dim} }}{i}} \right)$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{x}_{new} = \vec{x}_{cur} + \left( {\vec{u}_{new} \times \Delta t} \right),$$\end{document}

whereas $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i$$\end{document}$ denotes the ranking amongst each air parcel and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec{x}_{new}$$\end{document}$ indicates the novel position for the following iteration.

The WDO system develops a fitness function (FF) for reaching enhanced classifier results. It defines a positive integer for demonstrating the best efficiency of the candidate solution. In case of, minimizing of the classification error rate was regarded as FF is offered in Eq. (18).

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$fitness\left( {x_{i} } \right) = Error\,Rate\left( {x_{i} } \right)$$\end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$= \frac{number\; of\; misclassified\; samples}{{Total \;number\; of\; samples}}*100$$\end{document}

In the earlier sections, a prediction model based on the GRU with wind-driven optimization techniques [25] to carry out a thorough hyperparameter optimization was designed to predict DTIs. The complex process of parameter tuning plays a vital link between the design of our proposed approach and the validation stage that follows. As we proceed with the validation phase, our aim is to provide a smooth transition between these stages described in Fig. 1 and clarify how the improved parameters and the model’s approach supports the stability and effectiveness of the prediction model.

Results and discussion

In this section, the performance validation of the DTIP-WINDGRU model is carried out using four datasets [26] namely enzyme, Ion channel, GPCR, and nuclear receptor datasets. The details related to the dataset are given in Table 1.Table 1. Dataset DescriptionsDatabaseDrugTargetInteractionsEnzyme-dataset4456642926Ion channel-dataset2102041467GPCR-dataset22395635Nuclear receptor-dataset542690

Table 2 and Fig. 3 portray the labeled and unlabeled results offered by the DTIP-WINDGRU model on distinct top $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}$ values. The enzyme dataset with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 10\%$$\end{document}$ , DTIP-WINDGRU model has obtained 29,082 and 271 unlabeled and labeled instances respectively. In addition, on GPCR dataset with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 10\%$$\end{document}$ , the DTIP-WINDGRU model has attained 1978 and 249 unlabeled and labeled instances respectively. Simultaneously, on ION channel dataset with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 10\%$$\end{document}$ , the DTIP-WINDGRU model has gained 3962 and 459 unlabeled and labeled instances respectively while on nuclear receptor dataset with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 10\%$$\end{document}$ , the DTIP-WINDGRU model has provided 102 and 42 unlabeled and labeled instances respectively.Table 2. Labeled data analysis of DTIP-WINDGRU modelEnzyme datasetGPCR datasetTop k (%)UnlabelledLabelledTop k (%)UnlabelledLabelled1029,0822711019782492058,2583932039943683087,38156030597748640116,49275140795860350145,64187950995469660174,9388566012,03770070204,06010217014,01578280233,22211418016,03782590262,52211279018,098819100291,763116910020,146865ION channel dataset**Nuclear receptor dataset103962459101024220801877020214633012,021100030340704016,012120940448935020,041140750578936024,2181424607001017028,2201608708281038032,2621728809451169036,442172790104015110040,56317701001166155Fig. 3Labeled Data Analysis of DTIP-WINDGRU model. a Enzyme dataset, b G Protein-Coupled Receptors dataset, c ION Channel dataset, d Nuclear Receptor dataset, showing labeled and unlabeled instances at various top k% values

Table 3 and Fig. 4 report detailed AUC and AUPR outcomes of the DTIP-WINDGRU model on the test datasets applied. The experimental results revealed that the DTIP-WINDGRU model has accomplished effective values of AUC and AUPR under distinct seed values for instance, with enzyme dataset and CV_SEED value of 3201, the DTIP-WINDGRU model has provided AUC and AUPR of 95.91%, and 57.43% respectively. In GPCR dataset and CV_SEED value of 3201, the DTIP-WDOGRU model has gained AUC and AUPR of 86.87%, and 65.65% respectively. Moreover, with ION Channel dataset and CV_SEED value of 3201, the DTIP-WDOGRU model has accomplished AUC and AUPR of 84.48%, and 66.71% respectively. Furthermore, with Nuclear Receptor dataset and CV_SEED value of 3201, the DTIP-WDOGRU model has provided AUC and AUPR of 92.31%, and 78.56% respectively.Table 3AUC and AUPR on Different CV_SEEDs Values of Applied DatasetEnzyme datasetGPCR datasetCV_SEEDAUCAUPRCV_SEEDAUCAUPR320195.9157.43320186.8765.65203397.5368.43203385.5668.76517996.5669.87517987.3665.54293190.6161.21293189.2362.75911798.7772.92911795.8669.55ION channel dataset**Nuclear receptor dataset320184.4866.71320192.3178.56203389.4762.54203395.7681.11517993.5665.36517997.9985.15293185.6367.87293198.1988.22911794.0972.53911799.2689.65Fig. 4AUC and AUPR analysis of DTIP-WINDGRU model on various CV_SEEDs values. a Enzyme dataset, b G Protein-Coupled Receptors dataset, c ION Channel dataset, d Nuclear Receptor dataset, displaying AUC and AUPR performance across five cross-validation sets

A detailed comparative result analysis of the DTIP-WINDGRU model with recent models in terms of AUC on four test datasets is provided in Table 4.Table 4AUC analysis of DTIP-WINDGRU technique with existing methods under four datasetsMethodsEnzymeGPCRION ChannelNuclear receptorUDTPP (Lan et al. 2016)86.0087.6077.5080.00Bi-gram PSSM (Rajpura and Ngom, 2018)94.8088.9087.2086.90Nearest Neighbour (Rajpura and Ngom, 2018)89.8088.9085.2082.00IFB Model (Rajpura and Ngom, 2018)84.5081.2073.1083.00KBMF2 K (Rajpura and Ngom, 2018)83.2085.7079.9082.40DBSI (Rajpura and Ngom, 2018)80.6080.3080.3075.90ColdDTA (Fang et al. 2023)92.0090.5088.0091.50Drug Pocket Identification (Zhang et al., 2025)85.289.783.588.1DTIP-WINDGRU (Proposed)98.7795.8694.0999.26

Table 4 shows the comparative AUC results of the DTIP-WINDGRU model with existing models on enzyme and GPCR datasets. The results indicated that the DTIP-WINDGRU model has gained effectual results with maximum AUC on both datasets. The enzyme dataset, using DTIP-WINDGRU model has offered higher AUC of 98.77% whereas the UDTPP, Bi-gram PSSM, Nearest Neighbor, IFB, KBMF2 K, DBSI, ColdDTA [30] and Drug Pocket Identification models [33] have obtained lower AUC Of 86%, 94.80%, 89.80%, 84.50%, 83.20%, 80.60%, 92% and 85.2% respectively. The GPCR dataset utilizing DTIP-WINDGRU technique has accessible higher AUC of 95.86% whereas the UDTPP, Bi-gram PSSM, Nearest Neighbor, IFB, KBMF2 K, DBSI, ColdDTA and Drug Pocket Identification methods have reached lower AUC Of 87.60%, 88.90%, 88.90%, 81.20%, 85.70%, 80.30%, 90.50% and 89.7% correspondingly.

The comparative AUC results of the DTIP-WINDGRU methodology with existing models on ION channel and nuclear receptor datasets indicate that the DTIP-WINDGRU model has gained effective results with higher AUC on both datasets. The ION channel dataset, utilizing DTIP-WINDGRU methodology has offered higher AUC of 94.09% whereas the UDTPP, Bi-gram PSSM, Nearest Neighbor, IFB, KBMF2 K, DBSI, ColdDTA and Drug Pocket Identification systems have gained lower AUC Of 77.50%, 87.20%, 85.20%, 73.10%, 79.90%, 80.30%, 88% and 83.5% respectively. DTIP-WINDGRU algorithm on Nuclear receptor dataset has offered higher AUC of 99.26% whereas the UDTPP, Bi-gram PSSM, Nearest Neighbor, IFB, KBMF2 K, DBSI, ColdDTA and Drug Pocket Identification techniques have obtained minimal AUC Of 80%, 86.90%, 82%, 83%, 82.40%, 75.90%, 91.50% and 88.1% correspondingly.

A brief comparative result analysis of the DTIP-WINDGRU technique with recent methods in terms of AUPR on four test datasets is provided in Table 5 [27–29]. Figure 5 examines the comparative AUPR results of the DTIP-WINDGRU approach with existing models on enzyme and GPCR datasets. The results show that the DTIP-WINDGRU model has gained effective results with maximum AUPR on both datasets. The DTIP-WINDGRU system has offered higher AUPR of 72.92% on enzyme dataset whereas the BLM, SELF-BLM, PULBLM-3, PULBLM-5, and PULBLM-7 techniques have obtained lower AUPR of 57%, 63%, 67%, and 66% correspondingly. In addition, on GPCR dataset, the DTIP-WINDGRU approach has offered higher AUPR of 69.55% whereas the BLM, SELF-BLM, PULBLM-3, PULBLM-5, and PULBLM-7 models have obtained lower AUPR of 55%, 60%, 64%, 64%, and 65% correspondingly.Table 5AUPR analysis of DTIP-WINDGRU technique with existing methods under four datasetsMethodsEnzymeGPCRION ChannelNuclear ReceptorBLM (Haddadi and Keyvanpour, 2018)57.0055.0047.0042.00SELF-BLM (Keyvanpour, 2018)63.0060.0051.0045.00PULBLM-3 (Keyvanpour, 2018)67.0064.0060.0058.00PULBLM-5 (Lan et al. 2016)67.0064.0061.0059.00PULBLM-7 (Haddadi and Keyvanpour, 2018)66.0065.0063.0059.00DTIP-WINDGRU (Proposed)72.9269.5572.5389.65Fig. 5AUPR analysis of DTIP-WINDGRU model on Enzyme and GPCR datasets

Figure 6 shows the comparative AUPR results of the DTIP-WINDGRU algorithm with existing models on ION channel and nuclear receptor datasets. The results designated that the DTIP-WINDGRU technique has gained effectual results with maximal AUPR on both datasets. DTIP-WINDGRU model has offered higher AUPR of 72.53% on ION channel dataset whereas the BLM, SELF-BLM, PULBLM-3, PULBLM-5, and PULBLM-7 models have obtained decreased AUPR Of 47%, 51%, 60%, 61%, and 63% correspondingly. Eventually, on Nuclear receptor dataset, the DTIP-WINDGRU approach has offered higher AUPR of 89.65% whereas the BLM, SELF-BLM, PULBLM-3, PULBLM-5, and PULBLM-7 models have obtained lower AUPR of 42%, 45%, 58%, 59%, and 59% correspondingly.Fig. 6AUPR analysis of DTIP-WINDGRU model on ION channel and nuclear receptor datasets

After examining the comprehensive experimental analysis, it is concluded that the DTIP-WINDGRU model has accomplished maximum DTI predictive results over the other methods.

Table 6 shows a set of newer interactions for the proposed model DTIP-WINDGRU for each of the four datasets. These newer interactions were identified in publically available databases, proving the model’s efficacy in predicting undiscovered interactions from the available data. These interactions demonstrate the validity of our suggested model and demonstrate its applicability to the repurposing of currently prescribed medications for a wide range of unidentified disorders. It may also be used to real-world situations. For instance, in the case of SARS-CoV-2, it may be possible to find therapeutic compounds using drug repurposing from several heterogeneous sources. The use of computational approaches shortens the time and effort needed to find novel medications in wet laboratories. Our technique can advance the area of personalized medicine by selecting medications that are likely to be beneficial for certain individuals based on their genetic profiles and target protein interactions. Our approach predicts interactions between recently manufactured substances and target proteins, which can speed up the drug development process. This facilitates more effective drug candidates for pharmaceutical corporations, in order to evaluate the accuracy of the prediction, it is also advised to use already-approved medications for a condition that have been experimentally validated and further enhanced. It can also be used to identify interactions relating to uncommon disorders, where there may be a dearth of experimental data. The importance of our method in advancing drug discovery, streamlining drug development pipelines, and enhancing patient outcomes in the healthcare and biomedical fields is highlighted by these real-world applications. Finally, due to the use of deep learning techniques, our methodology requires vast amounts of data. However, the resulting new interaction data may be empirically validated to determine the success rate in the actual world.Table 6. Results of proposed method on gold standard drug-target interaction datasetEnzyme datasetGPCR datasetDrug IDTarget IDInter. ConfDatabaseDrug IDTarget IDInter. ConfDatabaseD00097hsa57430.9888D and MD00283hsa11310.8028DD00569hsa57420.9522DD00283hsa11320.7999DD00418hsa57420.9422DD00283hsa11330.7592DD00448hsa57430.9020D and CD00528hsa11280.7023MD00448hsa57420.8985D and CD00437hsa11280.6819MD02561hsa15650.7007MD00528hsa11290.6541MD00512hsa15650.6607MD00726hsa11290.6346MD00279hsa15650.6481MD00437hsa11290.6263MD00043hsa16360.6071MD01603hsa1540.6065DD00279hsa15570.6037MD00283hsa18140.6015D, C, and MION Channel Dataset**Nuclear Receptor DatasetD00499hsa11390.7066DD00554hsa21000.7089KD03365hsa11370.6518D and CD00312hsa21000.8185KD03365hsa11380.7025DD00067hsa21000.6084KD03365hsa11390.8053DD00898hsa21000.6270KD02207hsa11450.7448KD00962hsa21000.8584KD02173hsa11450.6091DD00443hsa52410.6477DD00611hsa11450.8291KD00443hsa3670.8252DD02356hsa11340.6507MD00586hsa20990.6068MD03742hsa11370.8011DD00951hsa20990.7647DD02356hsa570530.6019MD00954hsa3670.8625D

In order to validate the effectiveness of the WDO module, we conducted ablation experiments where we compared the performance of our DTIP-WINDGRU model with and without the WDO optimization represented in Table 7. The results of these experiments are as follows:

Baseline Model (Without WDO): We trained the DTIP-GRU model using standard grid search and random search methods for hyperparameter tuning. The performance metrics (AUPR and AUROC) were recorded for the ION channel, Nuclear receptor, Enzyme, and GPCR datasets.
Enhanced Model (With WDO): We then trained the DTIP-WINDGRU model using the Wind Driven Optimization algorithm for hyperparameter tuning. The same performance metrics were recorded for a fair comparison. Table 7. Performance evaluation of DTIP-GRU model on gold standard drug-target interaction datasetDatasetAUPR (without WDO) (%)AUPR (with WDO) (%)AUROC (without WDO) (%)AUROC (with WDO) (%)ION channel65.3272.5378.4583.67Nuclear receptor82.1489.6588.2392.15Enzyme70.1276.8482.6787.45GPCR75.8981.4785.3289.72

The ablation experiment results in Figs. 7 and 8 clearly demonstrate the effectiveness of WDO in improving the model’s performance. The DTIP-WINDGRU model, with WDO, consistently achieved higher AUPR and AUROC scores compared to the baseline model without WDO. The incorporation of Wind Driven Optimization in our model significantly enhances its performance by efficiently exploring the hyperparameter space, leading to optimal settings that improve predictive accuracy. The results of the ablation experiments provide strong evidence of the effectiveness of the WDO module in the DTIP-WINDGRU model.Fig. 7. Ablation results of AUPR analysis with and without WDO in all four datasetsFig. 8Ablation results of AUC analysis with and without WDO in all four datasets

Conclusion

This paper introduces a new model called DTIP-WINDGRU that integrates many methodologies and algorithms in order to improve the accuracy and effectiveness of DTIP. D-D and T-T interactions are used for weight initialization, GRU for prediction and hyperparameter optimization using the WDO technique. This is an important development since it enables the model to anticipate both known and prospective novel interactions, which is key for the development of new drugs. An important innovation is the use of the WDO method for hyperparameter optimization. Machine learning models must be tuned for optimal performance, and WDO provides a practical method for locating the best hyperparameters, improving the prediction outcomes. The proposed model has been experimentally validated, which elevates it to be useful for academics and professionals working in the pharmaceutical and biomedical industries. Real-world applications such as facilitating more effective drug candidates for pharmaceutical corporations and identifying interactions related to uncommon disorders highlights the significance of our proposed method. Additionally, developing personalized medicine by choosing medications that are likely to be beneficial for certain individuals, emphasizes the importance of our approach in advancing drug discovery, streamlining the drug development pipelines, and improving patient outcomes in the healthcare and biomedical fields. Our future work in drug discovery advancements include investigating hybrid DL models, which might produce even more precise DTIP predictions by combining several DL architectures or integrating DL with other computational techniques.

Bibliography3

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Öztürk H, Ozkirimli E, Özgür A. Wide DTA: prediction of drug-target binding affinity.2019. ar Xiv preprint ar Xiv:1902.04166.10.1093/bioinformatics/bty 593PMC 612929130423097 · doi ↗ · pubmed ↗
2Feng Q, Dueva E, Cherkasov A, Ester M. Padme: A deep learning-based framework for drug-target interaction prediction. 2018. ar Xiv preprint ar Xiv:1807.09741.
3http://web.kuicr.kyoto-u.ac.jp/supp/yoshi/drugtarget/