A geometric graph-based deep learning model for drug-target affinity prediction

Md Masud Rana; Farjana Tasnim Mukta; Duc D. Nguyen

PMC · DOI:10.1186/s12859-025-06347-2·December 18, 2025

A geometric graph-based deep learning model for drug-target affinity prediction

Md Masud Rana, Farjana Tasnim Mukta, Duc D. Nguyen

PDF

Open Access

TL;DR

This paper introduces DeepGGL, a deep learning model that improves drug-target binding affinity predictions using geometric graphs and attention mechanisms.

Contribution

DeepGGL is a novel deep learning model that combines geometric graph learning with attention and residual connections for drug-target affinity prediction.

Findings

01

DeepGGL outperformed existing models on CASF-2013 and CASF-2016 datasets.

02

The model showed consistent accuracy on CSAR-NRC-HiQ and PDBbind v2019 holdout sets.

03

It effectively captures atom-level interactions across multiple scales in protein-ligand complexes.

Abstract

In structure-based drug design, accurately estimating the binding affinity between a candidate ligand and its protein receptor is a central challenge. Recent advances in artificial intelligence, particularly deep learning, have demonstrated superior performance over traditional empirical and physics-based methods for this task, enabled by the growing availability of structural and experimental affinity data. In this work, we introduce DeepGGL, a deep convolutional neural network that integrates residual connections and an attention mechanism within a geometric graph learning framework. By leveraging multiscale weighted colored bipartite subgraphs, DeepGGL effectively captures fine-grained atom-level interactions in protein-ligand complexes across multiple scales. We benchmarked DeepGGL against established models on CASF-2013 and CASF-2016, where it achieved state-of-the-art performance…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Human immunodeficiency virus 1(no rank)

Chemicals6

Imiquimod Tacrine C.ar Aprepitant amide nitrogen nitrogen

Figures3

Click any figure to enlarge with its caption.

Workflow of the proposed DeepGGL model for protein-ligand binding affinity prediction

Prediction results of DeepGGL on CASF-2016 (left) and CASF-2013 (right) test sets. Each point shows the mean predicted pK for a complex, averaged over five independent training runs, versus its experimentally determined pK

Comparison of DeepGGL, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_s$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_m$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackag

Funding1

—NSF

Keywords

Geometric graph learningDeep convolutional neural networkResidual connectionAttentionProtein-ligand binding affinity

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Protein Structure and Dynamics · vaccines and immunoinformatics approaches

Full text

Introduction

Understanding the strength and specificity of protein-ligand interactions is a central challenge in molecular biology and drug discovery. Binding affinity-the quantitative measure of how tightly a ligand binds to its target protein-plays a crucial role in determining the efficacy and selectivity of therapeutic compounds. The binding affinity between a ligand and its protein receptor is often expressed as inhibition constant (Ki), dissociation constant (Kd), or half-maximal inhibitory concentration (IC50). Traditional experimental techniques such as isothermal titration calorimetry (ITC) and surface plasmon resonance (SPR) remain the gold standard for measuring binding affinity but are resource-intensive, time-consuming, and not scalable for high-throughput applications [1, 2]. This has fueled a growing demand for efficient computational approaches capable of delivering accurate affinity predictions to accelerate the drug discovery pipeline.

Computational scoring functions have long served as the cornerstone of structure-based drug discovery. These scoring functions are generally classified into physics-based, empirical, and knowledge-based categories. Physics-based methods rely on force-field calculations to estimate binding free energy by accounting for electrostatic and van der Waals interactions [3, 4], while empirical scoring functions are trained on experimental data to weight specific interaction terms [5, 6]. Knowledge-based scoring functions derive statistical potentials from structural databases to estimate binding propensities [7, 8]. In recent years, machine learning (ML) and deep learning (DL) approaches have emerged as powerful alternatives that can learn complex non-linear relationships from large datasets without relying on hand-crafted features [9, 10].

ML- and DL-based scoring functions can be broadly divided into sequence-based and structure-based approaches. Sequence-based models such as DeepDTA [11] and DeepDTAF [12] use protein sequences and ligand SMILES as input, enabling high-throughput predictions even in the absence of 3D structural data. However, they often lack spatial resolution, limiting their ability to capture the true nature of molecular interactions. In contrast, structure-based models such as RF-Score [9, 13], Pafnucy [14], and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {K}_\text {DEEP}$$\end{document}$ [15] operate directly on the 3D atomic coordinates of protein-ligand complexes, enabling a more detailed modeling of spatial interactions. These models have demonstrated superior predictive power but also face challenges such as generalization to novel chemical spaces and sensitivity to structural noise.

Graph theory provides a versatile and expressive framework for modeling complex molecular systems, particularly in the context of protein-ligand interactions [16–18]. Molecules can be naturally represented as graphs, where atoms serve as nodes and bonds or intermolecular interactions correspond to edges. This abstraction enables the encoding of both structural and interaction-level information in a compact form. Graph theory encompasses several branches, including geometric, algebraic, and topological graph theory. Geometric graph theory emphasizes spatial relationships between nodes, capturing the geometric connectivity critical for understanding local atomic environments [19, 20]. Algebraic graph theory analyzes the spectral properties of graphs using matrices such as the adjacency and Laplacian matrices to extract features related to network connectivity and interaction strength [21–23]. Topological graph theory investigates the embedding of graphs into topological spaces, offering tools such as persistent homology to explore the shape and continuity of molecular structures [24, 25]. Recently, geometric graph learning has emerged as a powerful technique for modeling biomolecular complexes by combining graph-based representations with machine learning [26–28]. This approach leverages geometric multiscale weighted colored subgraphs (MWCG) that encode local interactions between ligand atoms and protein residues, enabling models to capture both spatial organization and physicochemical context. The MWCG framework assigns edge weights based on Euclidean distances and labels interactions using radial basis functions, offering rotation- and translation-invariant representations for protein flexibility analysis and protein-ligand interactions [19, 27]. These graph-based methods facilitate the extraction of interpretable and robust features that reflect critical biophysical interactions, thereby enhancing the accuracy and generalizability of binding affinity prediction models.

While recent structure-based deep learning models for protein-ligand binding prediction have shown encouraging results [29–33], they often rely on limited feature representations and may not fully capture the geometric and physical principles underlying molecular interactions. Developing new deep learning frameworks offers the opportunity to address these limitations by incorporating richer structural encodings, integrating physical priors, and improving the interpretability and generalization of the models. In this study, we introduce DeepGGL, a deep geometric graph learning model for protein-ligand binding affinity prediction. DeepGGL constructs multi-range weighted bipartite subgraphs that encode chemical and geometric interactions between protein and ligand atoms of specific types. These subgraphs capture interaction patterns across multiple spatial scales, reflecting both local binding determinants and long-range structural context. The descriptors are fed into a deep 2D convolutional neural network with residual connections and attention mechanisms, enabling efficient learning of discriminative interaction features. Unlike prior models that rely solely on local contact maps or voxel grids, DeepGGL integrates multi-scale geometric information to form a holistic view of molecular interactions. We validate DeepGGL on standard benchmark datasets, including CASF-2013 and CASF-2016, as well as independent test sets such as CSAR-NRC-HiQ and the PDBbind v2019 holdout set. Across all evaluations, DeepGGL demonstrates superior performance compared to state-of-the-art methods. Ablation studies further show the importance of the residual blocks and attention layers in capturing complex interaction patterns and improving generalization. Overall, DeepGGL offers a robust, interpretable, and scalable approach to binding affinity prediction, offering a promising direction for next-generation structure-based scoring functions in drug discovery pipelines.

Materials and methods

Datasets preparation

The primary dataset used for model training in this study is the PDBbind v2016 dataset (http://www.pdbbind.org.cn), which comprises three overlapping subsets: the general set, the refined set, and the core set, with the latter being a subset of the former. The general set includes all available protein-ligand complexes, while the refined set consists of high-quality complexes with reliable structural and binding affinity data. Each complex in the PDBbind database is annotated with experimentally measured binding affinities, standardized into negative logarithmic scale (pKa) values to ensure consistency across different assay types.

For model evaluation, we employed the widely used CASF-2016 [34] and CASF-2013 [35] benchmark test sets. The CASF-2016 set, corresponding to the core set of PDBbind v2016, contains 285 high-quality protein-ligand complexes with experimentally validated binding affinities. The CASF-2013 set, derived from the core set of PDBbind v2013, comprises 195 complexes organized into 65 clusters, with binding affinities spanning nearly ten orders of magnitude.

To avoid data leakage between training and evaluation, all complexes included in CASF-2016 and CASF-2013 were first excluded from the general set of PDBbind v2016. Subsequently, 1,000 complexes were randomly sampled from the refined set as the validation set, following standard practice [14, 29]. The remaining general set complexes were then used to construct the training set.

Bipartite weighted subgraph descriptors

We encode the geometric and physicochemical properties of each protein–ligand complex by representing the molecular structure as a geometric graph $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ G = (V, E) $$\end{document}$ , where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ V $$\end{document}$ is the set of atoms and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ E $$\end{document}$ denotes edges representing non-covalent interactions, such as hydrophobic/hydrophilic contacts and intra-/inter-molecular forces. To incorporate domain-specific chemical context, we assign atom-type labels to the vertices using separate coloring schemes for protein and ligand atoms. Protein atoms are labeled with residue-level atom types—such as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}$ -carbon (CA), $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}$ -carbon (CB), and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta _1$$\end{document}$ -carbon (CD1)—while ligand atoms are classified using SYBYL atom types, which reflect their chemical environments (e.g., sp^3^ carbon (C.3), aromatic carbon (C.ar), aromatic nitrogen (N.ar), and amide nitrogen (N.am)). This dual atom-type coloring enables a fine-grained description of molecular interactions and structural roles.

Interactions between protein and ligand atoms are explicitly modeled by constructing a bipartite weighted subgraph for each complex, where protein and ligand atoms form two disjoint vertex sets. Let $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {T}}_p$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {T}}_l$$\end{document}$ denote the sets of atom types in the protein and ligand, respectively. Each atom is represented as a vertex $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\textbf{r}}, t)$$\end{document}$ , where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{r}} \in {\mathbb {R}}^3$$\end{document}$ denotes its spatial coordinate and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t \in {\mathcal {T}}_p \cup {\mathcal {T}}_l$$\end{document}$ is its type. The protein and ligand vertex sets are defined as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} & V_p = \{ ({\textbf{r}}_i, t^{(p)}_k) \mid {\textbf{r}}_i \in {\mathbb {R}}^3,\ t^{(p)}_k \in {\mathcal {T}}_p;\ i = 1, \ldots , N_p \},\\ & V_l = \{ ({\textbf{r}}_j, t^{(l)}_{k'}) \mid {\textbf{r}}_j \in {\mathbb {R}}^3,\ t^{(l)}_{k'} \in {\mathcal {T}}_l;\ j = 1, \ldots , N_l \}. \end{aligned}$$\end{document}

Edges are introduced between protein and ligand atoms whose Euclidean distance is below a predefined cutoff $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c > 0$$\end{document}$ . The bipartite edge set is defined as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} E_{pl} = \left\{ \Phi \left( \Vert {\textbf{r}}_i - {\textbf{r}}_j\Vert ;\ \eta _{kk'} \right) \,\Big |\, ({\textbf{r}}_i, t^{(p)}_k) \in V_p,\ ({\textbf{r}}_j, t^{(l)}_{k'}) \in V_l,\ \Vert {\textbf{r}}_i - {\textbf{r}}_j\Vert \le c \right\} , \end{aligned}$$\end{document}

where the weight function $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi $$\end{document}$ encodes the strength of the interaction. The characteristic interaction distance is defined as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \eta _{kk'} = \eta (r_k + r_{k'}), \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_k$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{k'}$$\end{document}$ are the van der Waals radii of atom types $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t^{(p)}_k$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t^{(l)}_{k'}$$\end{document}$ , respectively, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\eta > 0$$\end{document}$ is a scaling factor. The function $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi (d; \eta _{kk'})$$\end{document}$ is chosen to satisfy

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Phi (d; \eta _{kk'}) \rightarrow 1 \quad \text {as } d \rightarrow 0, \quad \text {and} \quad \Phi (d; \eta _{kk'}) \rightarrow 0 \quad \text {as } d \rightarrow \infty . \end{aligned}$$\end{document}

Two common choices for $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Phi $$\end{document}$ are the generalized exponential and Lorentz functions:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \Phi _E(d; \eta _{kk'}) = \exp \left( - \left( \frac{d}{\eta _{kk'}} \right) ^\kappa \right) , \quad \Phi _L(d; \eta _{kk'}) = \frac{1}{1 + \left( \frac{d}{\eta _{kk'}} \right) ^\kappa }, \quad \kappa > 0. \end{aligned}$$\end{document}

The resulting bipartite subgraph is formally represented as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_{pl} = (V_p \cup V_l,\ E_{pl})$$\end{document}$ , and serves as a structural representation of the protein-ligand binding interface. Informative geometric features are extracted from the bipartite subgraph by defining a centrality score for each atom that quantifies its interaction density with atoms of a specific paired type. For a given atom $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({\textbf{r}}_i, t_k)$$\end{document}$ in the protein and a corresponding atom type $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{k'}$$\end{document}$ in the ligand, we compute the geometric centrality $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{kk'}({\textbf{r}}_i)$$\end{document}$ as the cumulative interaction strength with all ligand atoms of type $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{k'}$$\end{document}$ within the cutoff distance. Formally, this is defined as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} C_{kk'}({\textbf{r}}_i) = \sum _{({\textbf{r}}_j, t_{k'}) \in V_l} \Phi \left( \Vert {\textbf{r}}_i - {\textbf{r}}_j\Vert ;\ \eta _{kk'}\right) \cdot \delta (t_j = t_{k'}), \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta (t_j = t_{k'})$$\end{document}$ is an indicator function equal to 1 if the atom type of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{r}}_j$$\end{document}$ is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t_{k'}$$\end{document}$ , and 0 otherwise. This centrality score captures how strongly a given atom is engaged in interactions with a specific ligand atom type, and is similarly defined for ligand atoms interacting with protein atom types. To construct a compact and expressive representation for each bipartite subgraph defined by atom type pair $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(t_k, t_{k'})$$\end{document}$ , we aggregate centrality values computed for all atoms-both protein and ligand-that participate in the subgraph. Let $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{kk'} = \{C_{kk'}({\textbf{r}}_i)\}_{i=1}^{n_{kk'}}$$\end{document}$ denote the set of centrality values for all relevant atoms associated with the subgraph defined by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(t_k, t_{k'})$$\end{document}$ . We compute a subgraph-level feature vector by applying a collection of statistical functions:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\textbf{f}}_{kk'} = \textrm{Stat}(C_{kk'}) \in {\mathbb {R}}^s, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{Stat}(\cdot )$$\end{document}$ returns a fixed-length summary vector containing $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s$$\end{document}$ statistical descriptors such as summation, mean, median, variance, standard deviation, minimum, and maximum. This unified representation captures the overall structural and interaction characteristics of the subgraph. The resulting tensor of shape $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{\mathcal {T}}_p| \times |{\mathcal {T}}_l| \times s$$\end{document}$ serves as a structured and interpretable encoding of the geometric and chemical interactions within the complex.

Model architecture

The overall workflow for DeepGGL is depicted in Fig. 1. For each protein-ligand complex, geometric graph features are derived from bipartite subgraphs constructed using weighted, atom-type-colored vertices. The set of atom types $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {T}}_p$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {T}}_l$$\end{document}$ consists of 37 residue-level atom types for protein and 28 SYBYL atom types for ligand, respectively, extracted from the training set complexes. Atom centrality values are computed for all pairwise protein-ligand atom type combinations and summarized using eight statistical descriptors (count, sum, mean, median, variance, standard deviation, min, max), yielding a feature tensor of size $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$37 \times 28 \times 8 = 8288$$\end{document}$ . Additionally, three distance cutoffs-5 Å (short-range), 10 Å (medium-range), and 15 Å (long-range)-are used to capture multi-range interaction of protein and ligand. The $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$8288$$\end{document}$ -dimensional feature vector ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$37 \times 28 \times 8$$\end{document}$ ) was reshaped into a two-dimensional grid of size $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$74 \times 112$$\end{document}$ for each distance cutoff. This configuration provides a balanced aspect ratio with a minimal difference between length and width, while also ensuring computational efficiency and compatibility with conventional CNN kernel sizes. The reshaping process preserves all feature values and organizes them into a compact, image-like structure that enables effective convolutional feature extraction. The three grids corresponding to short-, medium-, and long-range distance cutoffs were subsequently stacked to construct the final $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$74 \times 112 \times 3$$\end{document}$ tensor, analogous to an RGB image.

Prior to being fed into the network, the raw data is standardized using the StandardScaler from the scikit-learn library, which transforms each feature to have zero mean and unit variance. This normalization ensures that all input features contribute equally to the learning process, prevents dominance of features with larger numeric ranges, and improves the stability and convergence speed of gradient-based optimization.

The standardized input tensor is first processed by a 2D convolutional layer with 32 filters, after which it passes through two successive residual blocks. Residual connections—introduced in the ResNet architecture [36]—enable the network to learn a residual mapping $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\mathcal {F}}({\textbf{x}}) = {\mathcal {H}}({\textbf{x}}) - {\textbf{x}} $$\end{document}$ rather than a direct mapping $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {H}}({\textbf{x}})$$\end{document}$ . The output of a residual block can thus be expressed as

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\textbf{y}} = {\mathcal {F}}({\textbf{x}}, \{W_i\}) + {\textbf{x}}, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{x}}$$\end{document}$ is the input to the block, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textbf{y}}$$\end{document}$ is the output, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{W_i\}$$\end{document}$ denotes the learnable parameters. This formulation allows identity mappings to be propagated directly through the network, thereby mitigating vanishing gradient issues, accelerating convergence, and enabling effective training of deeper models. Here, each residual block contains two 2D convolutional layers, with output channels of 64 and 128 in the first and second blocks, respectively. Batch normalization is applied after each convolution to stabilize learning dynamics, and ReLU activation introduces nonlinearity. The skip connections not only preserve gradient flow but also promote feature reuse, allowing subsequent layers to focus on learning complementary refinements rather than redundant low-level representations. This design facilitates deeper and more expressive feature extraction while reducing the risk of network degradation in performance with increasing depth.Fig. 1. Workflow of the proposed DeepGGL model for protein-ligand binding affinity prediction

The final convolutional output is flattened and passed through a self-attention mechanism, implemented using the built-in Attention layer in TensorFlow Keras. This Luong-style dot-product attention mechanism [37] dynamically reweights features by computing their pairwise similarities, enabling the model to focus on the most informative spatial patterns within the subgraph-based representation. Formally, for query ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q$$\end{document}$ ), key ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K$$\end{document}$ ), and value ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V$$\end{document}$ ) matrices derived from the flattened features, the attention output is given by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textrm{Attention}(Q, K, V) = \textrm{softmax}\!\left( \frac{Q K^\top }{\sqrt{d_k}} \right) V, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$d_k$$\end{document}$ is the dimensionality of the key vectors. This mechanism enhances prediction accuracy by allowing the network to assign higher weights to more relevant feature interactions.

Following attention, the model includes two dense layers with 128 and 64 neurons, respectively. Each dense layer is followed by batch normalization and a dropout layer with a rate of 0.1. L2 regularization with a penalty parameter of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda = 0.01$$\end{document}$ is applied to all dense layers to reduce overfitting and improve generalization. The final output layer is a single linear neuron used to regress the predicted binding affinity ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$pK$$\end{document}$ ).

The model is trained using a custom loss function that combines the Pearson correlation coefficient ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ ) and the root mean square error (RMSE). The composite loss is defined as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\mathcal {L}} = \alpha (1 - R_p) + (1 - \alpha ) \cdot \textrm{RMSE}, \end{aligned}$$\end{document}

which encourages the model to produce predictions that are both accurate in magnitude and highly correlated with the ground truth. The parameter $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \in [0,1]$$\end{document}$ serves as a weighting factor to balance the contributions of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ and RMSE.

Stochastic Gradient Descent (SGD) with Nesterov momentum (0.9), a learning rate of 0.001, decay of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-6}$$\end{document}$ , and gradient clipping (clipvalue = 0.01) is used as the optimization algorithm. The model is trained with a batch size of 64 for up to 500 epochs. An early stopping mechanism halts training if the validation loss fails to improve by at least 0.001 over 20 consecutive epochs.

The entire model is implemented using the Keras API with TensorFlow as the backend. The only model hyperparameter optimized in this study was the convolutional kernel size and the weighting factor $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}$ in the loss function. To reduce the complexity of hyperparameter tuning and ensure architectural consistency, the same kernel size was applied to all convolutional layers. Kernel size plays a critical role in determining the receptive field of each convolutional filter, thereby controlling the balance between capturing fine-grained local patterns and broader contextual information within the input representation. The optimal kernel size was determined via five-fold cross-validation on the validation set (details provided in the Supporting Information). As shown in Fig. S1, a kernel size of 7 achieved the highest predictive performance, suggesting that this receptive field effectively captures the spatial dependencies relevant to our protein-ligand subgraph features. The optimal value for $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document}$ was determined by a similar five-fold cross validation. Figure S2 reveals that $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =0.7$$\end{document}$ achieved the best performance. Consequently, a kernel size of 7 and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =0.7$$\end{document}$ was adopted in the final model configuration.

Experiments and results

Evaluation metrics

To assess the predictive performance of our DeepGGL model, we employed three widely used regression metrics: the Pearson correlation coefficient ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ ), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). These metrics capture different aspects of prediction quality and allow for a comprehensive evaluation of model accuracy and robustness.

The Pearson correlation coefficient $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ measures the linear correlation between the predicted and actual binding affinity values. It is defined as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} R_p = \frac{\sum _{i=1}^{n} (y_i - {\bar{y}})({\hat{y}}_i - \bar{{\hat{y}}})}{\sqrt{\sum _{i=1}^{n}(y_i - {\bar{y}})^2} \cdot \sqrt{\sum _{i=1}^{n}({\hat{y}}_i - \bar{{\hat{y}}})^2}}, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_i$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{y}}_i$$\end{document}$ are the true and predicted binding affinity values, respectively, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\bar{y}}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{{\hat{y}}}$$\end{document}$ denote their respective means. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ ranges from $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1$$\end{document}$ to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1$$\end{document}$ , with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 1$$\end{document}$ indicating perfect positive correlation, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0$$\end{document}$ indicating no correlation, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = -1$$\end{document}$ indicating perfect negative correlation. A high $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ value reflects strong predictive consistency with the ground truth.

The RMSE quantifies the average magnitude of prediction error and is particularly sensitive to large deviations. It is given by:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textrm{RMSE} = \sqrt{\frac{1}{n} \sum _{i=1}^{n} (y_i - {\hat{y}}_i)^2}. \end{aligned}$$\end{document}

Lower RMSE values indicate better predictive accuracy. Because RMSE penalizes larger errors more heavily than smaller ones, it is useful for identifying models that tend to make large deviations from true values.

The MAE provides a measure of average absolute prediction error, offering a more interpretable and less variance-sensitive alternative to RMSE:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \textrm{MAE} = \frac{1}{n} \sum _{i=1}^{n} |y_i - {\hat{y}}_i|. \end{aligned}$$\end{document}

MAE is less influenced by outliers and provides a straightforward measure of average prediction deviation. Lower MAE values denote better model performance.

Together, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ , RMSE, and MAE provide complementary insights: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ evaluates the strength of the linear relationship between predictions and ground truth, while RMSE and MAE quantify the magnitude of prediction error. Models with high $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ and low RMSE/MAE are considered optimal for regression tasks such as binding affinity prediction.

Predictive power

To assess the predictive performance of DeepGGL, we trained the model independently five times and averaged the resulting predictions to obtain the final output for each protein-ligand complex. This ensemble approach reduces variance and improves robustness. Figure 2 presents scatter plots of the mean predicted values versus experimentally measured binding affinities on the CASF-2016 and CASF-2013 benchmark test sets. DeepGGL demonstrates strong predictive performance, achieving Pearson correlation coefficients exceeding 0.80 across all datasets. Specifically, the model yields an $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ of 0.810 and an RMSE of 1.158 on the validation set; an $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ of 0.868 and an RMSE of 1.15 on CASF-2016; and an $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ of 0.844 with an RMSE of 1.288 on CASF-2013 (see Table 1). These results highlight DeepGGL’s ability to generalize across datasets while maintaining both high correlation and low prediction error. We compared the proposed method with several state-of-the-art approaches summarized in Table S2 of the Supporting Information using the CASF-2016 and CASF-2013 benchmark datasets. All methods listed in Table S2, except OnionNet-2 and DeepTGIN, were trained on the PDBbind v2016 dataset (excluding the corresponding test sets). OnionNet-2 and DeepTGIN were trained on the more recent PDBbind v2019 and v2020 datasets, respectively. It is important to note that the RMSE values reported for AGL-Score and PerSpect are expressed in kcal/mol and were converted to pKd units by dividing by 1.3633 for consistency across methods.Table 1. Results of DeepGGL on different datasetsDataset $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ RMSEMAEtraining set0.9960.2120.165validation set0.8101.1580.882CASF-2016 core set0.8681.1500.900CASF-2013 core set0.8441.2881.071The error metrics are in pKd units

Fig. 2. Prediction results of DeepGGL on CASF-2016 (left) and CASF-2013 (right) test sets. Each point shows the mean predicted pK for a complex, averaged over five independent training runs, versus its experimentally determined pK

Performance on CASF-2016 benchmark

The CASF-2016 core set is a widely used benchmark for evaluating the generalization capability of scoring functions across diverse protein-ligand complexes. As summarized in Table 2, DeepGGL achieves the best performance among both traditional machine learning (ML) and deep learning (DL) methods, with an $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ of 0.868, RMSE of 1.150, and MAE of 0.90.

Compared to recent graph neural network models such as EIGN ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.857$$\end{document}$ ) and EGNA ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.842$$\end{document}$ ), DeepGGL shows a notable improvement in both correlation and error metrics. It also outperforms 3D spatially-aware convolutional models like OnionNet ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.816$$\end{document}$ ) and OnionNet-2 ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.864$$\end{document}$ ), which rely on distance-based grids but lack subgraph-level geometric representation. Sequence-based models such as DeepDTAF and CAPLA show weaker performance on this structure-based benchmark, underscoring the limitations of omitting 3D spatial information.

Traditional ML-based methods like RF-Score v3 and AGL-Score yield lower correlation coefficients ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.812$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.833$$\end{document}$ , respectively) and higher RMSE values, reflecting the advantage of end-to-end learning in DL architectures such as DeepGGL. These results affirm that incorporating geometric and topological information from colored subgraphs into a CNN-attention framework enhances the model’s capacity to learn discriminative patterns for binding affinity prediction.Table 2. Performance comparison of DeepGGL and state-of-the-art SFs on CASF-2016 benchmarkTypeMethod $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p \uparrow $$\end{document}$ RMSE $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document}$ MAE $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document}$ MLPerSpect [25]0.8401.265–AGL-Score [21]0.8331.271–PLEC-Linear [38]0.817––DLDeepGGL0.8681.1500.900OnionNet-2 [30]0.8641.164–EIGN [39]0.8571.131–EGNA [32]0.8421.2580.980DeepTGIN [40]0.8341.2030.949OnionNet [29]0.8161.2780.984Pafnucy [14]0.7801.4201.130 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {CAPLA}^\text {*}$$\end{document}$ [41]0.8431.2000.966 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {DeepDTAF}^\text {*}$$\end{document}$ [12]0.7891.3551.073The error metrics are in pKd units, and the value in bold indicates the best outcome for the respective metric. The methods marked by an asterisk indicate a 1D sequence-based model. The performance metrics for each method were obtained from their respective original publications

Performance on CASF-2013 benchmark

On the CASF-2013 core set, DeepGGL also achieves the best performance among all evaluated methods, with an $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ of 0.844, RMSE of 1.288, and MAE of 1.071 (Table 3). This benchmark, although older, still poses significant challenges due to its diverse composition and lower sequence similarity to training sets.

DeepGGL outperforms OnionNet-2 ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.821$$\end{document}$ ) and OnionNet ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.782$$\end{document}$ ), further supporting the utility of learning features from atom-level subgraph statistics over simple spatial representations. Additionally, graph-based models such as CAPLA ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.770$$\end{document}$ ) and Pafnucy ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.700$$\end{document}$ ) perform worse, suggesting that explicit modeling of protein-ligand subgraph interactions at multiple distance scales, as done in DeepGGL, contributes to higher generalization capability.

The performance gap between DeepGGL and sequence-based methods such as DeepDTAF (with a relatively low $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.608$$\end{document}$ ) is especially pronounced in CASF-2013, indicating that sequence-only models may fail to capture the nuanced spatial and physicochemical interactions necessary for accurate affinity prediction in diverse protein-ligand complexes.

Together, these evaluations demonstrate that DeepGGL not only achieves state-of-the-art performance on benchmark datasets but also generalizes effectively across structurally and compositionally diverse protein-ligand complexes. The integration of subgraph-based geometric features with a deep residual CNN and attention mechanism provides a powerful framework for binding affinity prediction.Table 3. Performance comparison of DeepGGL and state-of-the-art SFs on CASF-2013 benchmarkTypeMethod $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p \uparrow $$\end{document}$ RMSE $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document}$ MAE $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document}$ MLPerSpect0.7931.435–AGL-Score0.7921.447–PLEC-Linear0.757––DLDeepGGL0.8441.2881.071EIGN0.8371.249–OnionNet-20.8211.357–DeepTGIN0.7871.3881.123OnionNet0.7821.5031.208Pafnucy0.7001.6201.320 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {CAPLA}^\text {*}$$\end{document}$ 0.7701.4461.154 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {DeepDTAF}^\text {*}$$\end{document}$ 0.6082.1031.737The error metrics are in pKd units, and the value in bold indicates the best outcome for the respective metric. The methods marked by an asterisk indicate a 1D sequence-based model. The performance metrics for each method were obtained from their respective original publications

Multi-range interaction

Protein-ligand binding is governed by a complex interplay of atomic interactions that occur over various spatial ranges. To effectively capture this diversity, DeepGGL integrates geometric features extracted at three distinct distance cutoffs: short-range (5Å), medium-range (10Å), and long-range (15Å). This multi-range feature representation enables the model to encode both local atomic contacts and more distal structural interactions that collectively influence binding affinity.

To systematically evaluate the contribution of each spatial scale, we constructed three ablation models- $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_\text {s}$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_\text {m}$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_\text {l}$$\end{document}$ -each trained with features derived from only one of the distance cutoffs. All inputs were reshaped to a uniform dimension of (74, 112, 1) to ensure consistent model architecture across variants. Figure 3 illustrates the comparative performance of these models alongside the full DeepGGL model, which integrates features across all three ranges.

On the CASF-2016 benchmark, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_\text {m}$$\end{document}$ (medium-range features) achieves the best single-scale performance with an $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ of 0.8600, RMSE of 1.2265, and MAE of 0.9795. This suggests that interactions within a 10,Å radius are particularly informative for predicting binding affinity. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_\text {s}$$\end{document}$ (short-range, 5,Å) performs comparably well, with an $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ of 0.8571, though it shows slightly higher RMSE (1.2362) and MAE (0.9889), indicating that short-range contacts alone are somewhat less predictive. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_\text {l}$$\end{document}$ (long-range, 15,Å) yields a lower $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ of 0.8400 and higher RMSE (1.3054), suggesting that while long-range interactions contribute valuable context, they are less discriminative when used in isolation.

A similar trend is observed on the CASF-2013 benchmark. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_\text {m}$$\end{document}$ again performs best among the ablation models ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.8423$$\end{document}$ , RMSE = 1.3356, MAE = 1.1203), followed closely by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_\text {s}$$\end{document}$ ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.8146$$\end{document}$ , RMSE = 1.3919, MAE = 1.1549). $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_\text {l}$$\end{document}$ trails with an $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ of 0.8201 and the highest RMSE and MAE values (1.4177 and 1.1918, respectively).

Importantly, the full DeepGGL model-incorporating all three spatial scales-achieves the best overall performance on both benchmarks (see Table 1), underscoring the synergistic value of integrating multi-range features. These findings highlight the complementary roles of short-, medium-, and long-range interactions in capturing the structural determinants of protein-ligand binding. The multi-range representation equips DeepGGL with a more holistic view of the complex interaction landscape, leading to improved generalization and predictive accuracy.Fig. 3. Comparison of DeepGGL, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_s$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {DeepGGL}_m$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {and DeepGGL}_l$$\end{document}$ on CASF 2016 and CASF 2013 datasets

Generalization ability

To evaluate the generalization capability of DeepGGL beyond the CASF benchmarks, we conducted independent testing on two external datasets: the CSAR NRC-HiQ set and the PDBbind v2019 ‘hold-out set’. These datasets represent protein-ligand complexes not included during model training or validation and span diverse structural and temporal characteristics. Assessing model performance on these sets offers a robust indication of its ability to generalize to previously unseen and more recent protein-ligand pairs, which is crucial for real-world applications in drug discovery.

The Community Structure-Activity Resource (CSAR) dataset [42] is a high-quality benchmark curated to evaluate scoring functions for protein-ligand docking and affinity prediction. We specifically utilize the CSAR NRC-HiQ subset, which contains 343 experimentally resolved protein-ligand complexes. This subset was originally divided into two groups based on deposition time in the Protein Data Bank (PDB), consisting of 176 and 167 complexes, respectively. Following the protocol of Xia et al. [32], we merged the two subsets and removed any complexes that overlapped with the PDBbind v2016 training set to avoid data leakage. The resulting filtered set contains 49 unique complexes, providing a stringent test of model generalization to structurally diverse and temporally disjoint samples.

As shown in Table 4, DeepGGL achieves the highest Pearson correlation coefficient ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.764$$\end{document}$ ) and the lowest RMSE (1.512) among all evaluated models, with a competitive MAE (1.211). It outperforms state-of-the-art methods such as EGNA ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.750$$\end{document}$ ), PLEC-Linear ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.741$$\end{document}$ ), and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {K}_{\textrm{DEEP}}$$\end{document}$ ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.710$$\end{document}$ ), indicating that its learned geometric representations are highly transferable. Notably, although EGNA achieves a slightly lower MAE (1.190), its lower correlation and higher RMSE suggest reduced consistency across the dataset. Traditional machine learning models such as RF-Score v3 perform markedly worse ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.532$$\end{document}$ , RMSE = 1.917), demonstrating the superiority of deep learning-based feature extraction in generalization tasks.Table 4. Performance of DeepGGL on CSAR-NRC-HiQ DatasetMethod $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p \uparrow $$\end{document}$ RMSE $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document}$ MAE $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document}$ DeepGGL0.764****1.5121.211EGNA0.7501.5361.190PLEC-Linear0.7411.6561.242 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {K}_{\textrm{DEEP}}$$\end{document}$ 0.7101.6301.257OnionNet0.6611.7141.319Pafnucy0.6201.9031.602The error metrics are in pKd units, and the value in bold indicates the best outcome for the respective metric. The performances of other models are adopted from [32]

To further assess the performance of DeepGGL on newer structural data, we evaluated it on the PDBbind v2019 hold-out set, as used in prior studies [43]. This set comprises 222 protein-ligand complexes from the PDBbind v2019 refined set that were deposited after the PDBbind v2016 release and are therefore disjoint from the CASF-2016 benchmark. This dataset provides a realistic simulation of predicting binding affinity for novel and unseen protein-ligand structures, making it highly relevant for prospective applications.

DeepGGL again demonstrates strong generalization, outperforming other models in all three evaluation metrics on this dataset (Table 5). It achieves the highest $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ of 0.557, along with the lowest RMSE (1.267) and MAE (1.001). Compared to DeepFusion ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.545$$\end{document}$ ) and Pafnucy ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.528$$\end{document}$ ), DeepGGL exhibits better predictive power and lower prediction errors. The performance gap is even more pronounced against older models such as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {K}_{\textrm{DEEP}}$$\end{document}$ ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p = 0.487$$\end{document}$ , RMSE = 1.424), confirming that DeepGGL’s multi-range geometric encoding enables better generalization to evolving protein-ligand structures.

Across both independent test sets, DeepGGL consistently achieves superior performance, reflecting its ability to generalize well beyond the training distribution. The results validate that DeepGGL’s design-leveraging geometric and algebraic graph-based features across multiple interaction ranges-is robust to structural novelty and dataset drift, a critical advantage for practical deployment in computational drug discovery pipelines.Table 5. Performance of DeepGGL on PDBbind v2019 Hold-Out setMethod $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p \uparrow $$\end{document}$ RMSE $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document}$ MAE $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\downarrow $$\end{document}$ DeepGGL0.5571.2671.001DeepFusion0.5451.3381.074Pafnucy0.5281.3811.106 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {K}_{\textrm{DEEP}}$$\end{document}$ 0.4871.4241.135The error metrics are in pKd units, and the value in bold indicates the best outcome for the respective metric. The performances of other models are adopted from [43]

Model ablation study

To better understand the contribution of key architectural components in DeepGGL, we conducted an ablation study by systematically removing the residual block and attention mechanism (architectures are in the supporting information). The analysis was performed across four benchmark datasets: CASF-2016, CASF-2013, CSAR NRC-HiQ, and the PDBbind v2019 hold-out set. Table S1 summarizes the performance in terms of the Pearson correlation coefficient ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_p$$\end{document}$ ), reflecting the quality of predicted binding affinities relative to experimental values.

Effect of residual connection Removing the residual connection from DeepGGL leads to a consistent but modest drop in performance across all datasets. Specifically, the correlation on CASF-2016 decreased from 0.868 to 0.857, and on the PDBbind v2019 hold-out set from 0.557 to 0.547. These results highlight the residual block’s role in facilitating deeper network architectures by enabling better gradient flow and mitigating the vanishing gradient problem. Its inclusion appears particularly important for datasets like PDBbind v2019, which contain more structurally diverse and recent complexes, suggesting that residual connections help the model generalize to complex unseen data.

Effect of attention mechanism To assess the contribution of the attention mechanism, we removed it while retaining all other components. Interestingly, DeepGGL without attention slightly outperformed the full model on CASF-2013 (0.845 vs. 0.844) and CSAR NRC-HiQ (0.769 vs. 0.764), albeit by a narrow margin. However, on CASF-2016 and PDBbind v2019 hold-out, performance decreased slightly, especially in the latter case (0.544 vs. 0.557). These observations indicate that the attention mechanism is most beneficial in scenarios where complex spatial patterns and long-range dependencies are critical, such as newer datasets with more diverse structures. Its effect may be dataset-dependent, with diminishing returns in benchmark sets that exhibit more uniform interactions.

Effect of removing both components When both the residual block and the attention mechanism are removed, the model shows a clear degradation in performance across all benchmarks. The Pearson correlation drops to 0.861 on CASF-2016 and to 0.507 on the PDBbind v2019 hold-out set-an appreciable loss of predictive power compared to the full model. This decline confirms that these components work synergistically to enhance the model’s capacity to capture non-linear and hierarchical interactions across spatial scales.

Overall, both the residual connection and attention mechanism play important roles in enhancing DeepGGL’s performance and generalization ability. The residual block contributes to stable and deep learning, especially in structurally diverse scenarios, while the attention mechanism provides adaptability in capturing long-range and context-dependent geometric features. Their combined effect yields the highest performance across the majority of benchmarks, affirming their design relevance in the DeepGGL architecture.

Case study

Following a similar strategy as in [16], we applied our model to predict potential drug candidates targeting Human Immunodeficiency Virus Type 1 (HIV-1, PDB ID: 4JMU). Three-dimensional structures of approved drugs were obtained from the DrugBank database (https://go.drugbank.com/) and docked to the target using AutoDock Vina [44] to generate the protein-ligand complexes required as input for our model. Compounds were then ranked according to their predicted binding affinities by DeepGGL, and the top ten candidates are listed in Table 6.

Among these top candidates, Imiquimod, Aprepitant, and Tacrine have reported anti-HIV activity in PubMed, supporting the biological relevance of the model’s predictions. The remaining compounds have not been experimentally validated for HIV-1 inhibition; their identification illustrates the model’s ability to prioritize compounds based on predicted binding affinity, providing a focused set of candidates for further experimental investigation.

A key limitation of our approach is its dependence on pre-docked protein-ligand complexes. Unlike sequence- or structure-based affinity prediction models that can operate solely from individual ligand or receptor structures, our model requires accurate docking poses as input. Consequently, the predictive performance is influenced by the quality of docking, and potential biases or inaccuracies in docking can propagate to the affinity predictions. While this constraint ensures that the model evaluates realistic binding interactions, it also limits applicability in large-scale virtual screening when docking all candidates is computationally expensive. Future work may focus on integrating docking-free representations or combining rapid approximate docking with our model to broaden its utility in high-throughput settings.Table 6. Top ten predicted potential therapeutic compounds against HIV-1RankDrugBank IDDrug nameEvidence1DB12245TriclabendazoleUnconfirmed2DB00724Imiquimod****PMID: 145303573DB00555LamotrigineUnconfirmed4DB09343TipiracilUnconfirmed5DB16965PBT-434Unconfirmed6DB08882LinagliptinUnconfirmed7DB00673Aprepitant****PMID: 180408258DB00382Tacrine****PMID: 13588329DB14200ThiohexamUnconfirmed10DB09242MoxonidineUnconfirmedCompounds shown in bold have supporting evidence identified in PubMed

Conclusion

In this study, we introduced DeepGGL, a deep convolutional neural network designed to predict protein-ligand binding affinity with high accuracy. By incorporating bipartite weighted subgraph descriptors at multiple interaction ranges, DeepGGL effectively captures both geometric and chemical features of protein-ligand complexes. Comprehensive benchmarking on CASF-2013 and CASF-2016 datasets demonstrates that DeepGGL outperforms existing state-of-the-art methods. Further validation on independent datasets, including CSAR NRC-HiQ and the PDBbind v2019 hold-out set, confirms the model’s strong generalization capabilities. The robustness of DeepGGL to structural diversity and dataset shift underscores its practical applicability in real-world drug discovery settings. Ablation studies further reveal that architectural components such as residual connections and attention mechanisms are instrumental in enhancing model performance. Overall, DeepGGL represents a significant step forward in structure-based affinity prediction, offering a reliable and scalable approach for accelerating computational drug design.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Information 1.

Bibliography6

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Eldridge MD, Murray CW, Auton TR, Paolini GV, Mee RP. Empirical scoring functions: I. the development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput-aided Mol Des 11(5), 425–445 (1997)10.1023/a:10079961245459385547 · doi ↗ · pubmed ↗
2Luo D, Zhou J, Xu L, Yuan S, Lin X. Dynamicdta: Drug-target binding affinity prediction using dynamic descriptors and graph representation. Interdis Sci: Comput Life Sci 1–16 (2025).10.1007/s 12539-025-00729-z 40481301 · doi ↗ · pubmed ↗
3Xia C, Feng SH, Xia Y, Pan X, Shen HB. Leveraging scaffold information to predict protein–ligand binding affinity with an empirical graph neural network. Brief Bioinform. 2023;24(1).10.1093/bib/bbac 60336627113 · doi ↗ · pubmed ↗
4Li Y, Han L, Liu Z, Wang R. Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results. J Chem Inf Model 2014;54(6):1717–1736.10.1021/ci 500081 m 24708446 · doi ↗ · pubmed ↗
5Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. ar Xiv preprint ar Xiv:1508.04025 2015.
6Wójcikowski M, MichałKukiełka Stepniewska-Dziubinska MM, Siedlecki P. Development of a protein–ligand extended connectivity (plec) fingerprint and its application for binding affinity predictions. Bioinformatics 2019;35(8):1334–1341.10.1093/bioinformatics/bty 757PMC 647797730202917 · doi ↗ · pubmed ↗