GNN-MA: Soft Molecular Alignment with Cross-Graph Attention for Ligand-Based Virtual Screening

Keling Liu; Dongmei Wei; Rui Shi; Zhiyuan Zhou

PMC · DOI:10.3390/molecules31060991·March 16, 2026

GNN-MA: Soft Molecular Alignment with Cross-Graph Attention for Ligand-Based Virtual Screening

Keling Liu, Dongmei Wei, Rui Shi, Zhiyuan Zhou

PDF

Open Access

TL;DR

This paper introduces GNN-MA, a new model for drug discovery that improves molecule screening using graph-based alignment without needing 3D structures.

Contribution

GNN-MA introduces cross-graph attention and bond-to-atom aggregation for soft molecular alignment in ligand-based virtual screening.

Findings

01

GNN-MA achieves competitive ROC-AUC scores on DUD-E and LIT-PCBA datasets.

02

It improves early-enrichment metrics (EF@1–5%) on DUD-E compared to ablated variants.

03

The model provides interpretable atom-level alignment insights in case studies.

Abstract

Ligand-based virtual screening (LBVS) seeks strong early enrichment when searching ultra-large libraries, but practical screening often relies on 1D/2D descriptions while 3D information is expensive and uncertain due to conformer generation and alignment. We propose GNN-MA, a retrieval-style pairwise scoring model for query–candidate molecular pairs that uses molecular graphs as a unified representation. Built on intra-graph message passing, GNN-MA adds cross-graph attention to learn atom-level soft alignment that focuses on key substructures relevant to activity matching, and introduces a bond-to-atom semantic aggregation module to better exploit chemical bond cues for similarity scoring. The framework uses 2D molecular graphs derived from SMILES for retrieval-style matching and does not rely on explicit 3D conformational modeling or alignment. Experiments on DUD-E and LIT-PCBA show…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals1

GNN-MA

Figures5

Click any figure to enlarge with its caption.

Equations2

Funding1

—Sichuan Higher Education Research Association Project

Keywords

virtual screeningscaffold hoppingmolecular representationgraph attentionearly enrichment

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Advanced Graph Neural Networks

Full text

1. Introduction

Ligand-based virtual screening (LBVS) is a widely used prioritization strategy in early-stage drug discovery [1,2,3,4,5,6]. Its goal is to identify potential actives as early as possible from ultra-large candidate libraries [7,8,9,10,11]. Because experimental follow-up typically covers only a small top-ranked subset of compounds (often 0.1–5%), early enrichment often reflects practical screening value better than overall ranking performance [2,12,13,14,15,16]. Meanwhile, inputs in real screening workflows are frequently incomplete and heterogeneous: in most cases only 1D/2D descriptions are available, and 3D-related information is not always accessible; even when it is, conformer generation, conformer selection, and spatial alignment introduce extra cost and uncertainty [13,17,18,19,20,21,22]. Therefore, in this work we focus on a 2D graph-based formulation that avoids explicit 3D conformational modeling while still aiming at stable early enrichment under realistic screening conditions [1,2,6,7,8,9,10,11,15,16].

Existing LBVS methods can be roughly grouped into three categories [1,2,5,6,23]. The first category consists of traditional approaches built on 2D fingerprints and similarity measures, such as MACCS [24] and ECFP4 [25], which are computationally efficient and easy to scale; however, these representations inevitably compress structural information and struggle to explicitly capture fine-grained correspondences between a query and a candidate molecule, so early enrichment can be limited when the two molecules differ substantially in scaffold, or when activity is driven by only a few critical local fragments [2,6,23,24,25,26,27]. The second category includes 3D shape- or pharmacophore-matching methods (e.g., PheSA [19] and ROSHAMBO [20]), which can describe spatial similarity more directly; yet they typically rely on upstream procedures such as conformer generation and alignment, increasing computational cost and making performance sensitive to conformer quality and input settings, which in turn restricts their stable use on ultra-large libraries or in scenarios where 3D information is missing [9,12,13,17,18,19,20,28]. The third category comprises deep learning approaches to molecular representation learning [29,30,31,32,33,34,35,36,37,38,39,40], especially graph neural networks (GNNs) and models that incorporate 3D geometric information (e.g., SchNet and DimeNet++), which can learn richer structural semantics; nevertheless, many of these methods are primarily designed for single-molecule property modeling and pay insufficient attention to the query–candidate retrieval-style matching required by LBVS [3,41,42,43,44,45,46]. As a result, there is still clear room to improve how we explicitly model local correspondences and cross-molecule interactions—without relying on rigid alignment—so that the learning objective better serves early-enrichment-oriented screening and provides chemically meaningful rationales for prioritization [1,2,23,26,27,47,48,49].

To address these issues, we explicitly formulate LBVS as a query–candidate pairwise scoring-and-ranking problem and propose GNN-MA. The method uses the molecular graph as a unified representation: it first learns structural semantic embeddings via intra-molecular message passing, and then introduces a cross-graph attention mechanism to model inter-molecular interactions explicitly at the atom level. This enables atom-wise cross-molecule matching that highlights the key substructures most responsible for activity transfer and ranking decisions, providing a qualitative cue for interpreting which local fragments contribute to the matching score in case studies [3,26,44,45,46,48,49,50]. Rather than claiming novelty from the use of cross-graph attention alone, our main technical contribution lies in adapting alignment-aware graph comparison specifically to LBVS. In particular, the proposed framework combines bond-aware representation enhancement through edge fusion and bond-to-atom aggregation with a ranking-oriented training objective that is explicitly designed to improve early enrichment in target-specific virtual screening. In addition, we enhance atom representations through semantic aggregation of chemical bond information, strengthening the expressiveness of similarity estimation [41,42,43,48,50]. Importantly, GNN-MA operates on standard 2D molecular graph representations and does not depend on rigid 3D alignment procedures [2,6,13,17,18,19,20,42,43].

The main contributions of this work are as follows:

We cast ligand-based virtual screening as a retrieval-style query–candidate pairwise scoring task, aligning model learning more closely with early-enrichment-oriented screening objectives.
We propose an alignment-aware graph matching mechanism based on cross-graph attention to capture atom-level correspondences between molecules in a fully 2D representation space.
We design a bond-aware and ranking-oriented learning framework by combining edge fusion, bond-to-atom aggregation, and a within-target ranking constraint for improved early retrieval.
We provide a practical empirical study on DUD-E and LIT-PCBA with macro-aware aggregation views, per-target analysis, and efficiency evaluation for shortlist re-ranking.

2. Results and Discussion

2.1. Overview of Evaluation

We evaluated GNN-MA on two widely used ligand-based virtual screening (LBVS) benchmarks, DUD-E and LIT-PCBA. Performance was assessed using ROC-AUC [51] for overall discrimination and early enrichment EF@k% (k = 1, 2, 5, 10, 20) for top-ranked retrieval quality.

To avoid ambiguous reporting and to address target-size imbalance, we report results under four complementary aggregation views (definitions in Section 3.6): (i) global average (pooled), (ii) macro-average, (iii) weighted macro-average, and (iv) macro-target statistics. Because global average results can be disproportionately influenced by a few large targets, the main text focuses on macro-average, weighted macro-average, and macro-target evidence, whereas the corresponding global average summaries, detailed per-target tables, and supplementary statistical results are provided in the Supporting Information.

2.2. Results on DUD-E

2.2.1. Overall Discriminative Performance (ROC-AUC)

Figure 1 compares ROC-AUC on DUD-E across six models under the macro-average and weighted macro-average views. GNN-MA achieves the best overall discriminative performance, while GNN-MA-intra shows a clear drop, indicating that cross-graph interaction contributes positively to discrimination. GMN [52], Siamese_GNN [53], and DeepChem provide strong learning-based baselines but remain below GNN-MA, whereas ECFP4 performs substantially worse. The close agreement between macro and weighted macro AUC indicates that the ranking of methods is stable under these macro-based summaries.

2.2.2. Early Enrichment Performance (EF@k%) on DUD-E

Table 1 and Table 2 show that GNN-MA delivers stable early enrichment performance on DUD-E in both weighted macro-average and macro-average views. Its advantage is not limited to a single cutoff: the model remains strong from EF@1% through EF@20%, indicating that the gain is sustained across practical shortlist sizes rather than confined to only the very top-ranked molecules. The gap relative to GNN-MA-intra is consistent across all reported cutoffs, supporting the contribution of cross-graph interaction to retrieval quality. Compared with GMN, Siamese_GNN, and DeepChem, GNN-MA stays at the top or within the top tier under both aggregation views, whereas ECFP4 remains clearly inferior.

2.3. Results on LIT-PCBA

2.3.1. Discriminative Performance (ROC-AUC)

Figure 2 indicates that GNN-MA maintains competitive discriminative performance on the more challenging LIT-PCBA benchmark and retains a clear advantage over GNN-MA-intra. GMN, Siamese_GNN, and DeepChem also remain competitive, whereas ECFP4 performs less favorably. Given the substantial variation in target sizes in LIT-PCBA, we focus primarily on macro-based summaries in the main text.

2.3.2. Early Enrichment Performance (EF@k%) on LIT-PCBA

Table 3 and Table 4 indicate that on the more heterogeneous LIT-PCBA benchmark, the benefit of GNN-MA is most pronounced at very early cutoffs. Under both weighted macro-average and macro-average summaries, the largest margins over GNN-MA-intra appear at EF@1–5%, which is particularly relevant for top-priority retrieval in practical screening. This pattern is consistent with the stronger target-size imbalance of LIT-PCBA, where macro-based summaries are more informative than pooled results and performance is more target-dependent than on DUD-E. Relative to GMN, Siamese_GNN, and DeepChem, GNN-MA remains highly competitive across cutoffs, while ECFP4 again shows clearly weaker enrichment.

2.4. Per-Target Consistency Analysis (Macro)

To further examine whether the observed performance gains could be consistently reflected across targets, we analyzed the per-target differences between GNN-MA and several baseline models, as illustrated in Figure 3. For each target, ROC-AUC and EF@1% were computed independently on the target-specific test set, and the differences between GNN-MA and the corresponding baselines were then calculated. The resulting ΔAUC and ΔEF@1% values therefore indicate how much GNN-MA improves or decreases performance relative to a baseline for individual targets.

Figure 3A,B show the distributions of per-target improvements for DUD-E. where green denotes comparisons against DeepChem and the intra-graph variant (GNN-MA-intra), and blue denotes comparisons against GMN and Siamese_GNN. Figure 3A presents ROC-AUC improvements, and Figure 3B presents EF@1%

The distributions are generally shifted toward positive values when comparing GNN-MA with DeepChem and the intra-graph variant (GNN-MA-intra), indicating that the performance gains are broadly observed across targets rather than driven by a small number of cases. In contrast, the differences relative to GMN and Siamese_GNN are smaller, suggesting that these graph-pair baselines remain competitive while GNN-MA maintains favorable target-level consistency.

Figure 3C,D present the corresponding results for LIT-PCBA. Because target sizes in LIT-PCBA are highly imbalanced (see Supplementary Materials Table S2), per-target analysis is particularly informative. The distributions suggest a generally positive trend in EF@1% relative to the baselines, while ROC-AUC remains broadly competitive at the target level.

2.5. Efficiency and Scalability

In large-scale virtual screening, inference efficiency is as critical as accuracy. To quantify deployability in large-library retrieval scenarios, we measured throughput under different CPU/GPU settings and separated preprocessing time from model inference time (Table 5).

The results show that (as shown in Table 6), in the Forward-only (pure forward inference) setting, GPU reached 280.2 pairs/s, far higher than CPU’s 9.1 pairs/s, indicating that the forward computation is highly accelerator-friendly. In the end-to-end (E2E) evaluation, enabling caching boosted GPU throughput to 231.6 pairs/s, notably higher than 161.4 pairs/s under the Online mode, suggesting that preprocessing and data pipeline overheads are key factors for E2E performance. On CPU, E2E throughput was about 8–9 pairs/s, close to its Forward-only results, implying that CPU E2E is mainly constrained by overall compute.

Overall, caching and batching can substantially improve end-to-end throughput, supporting the use of GNN-MA as a refinement model for shortlist re-ranking in large-library screening workflows.

2.6. Soft-Alignment Visualization

To qualitatively illustrate the matching patterns learned by the model, we present a soft-alignment case study using a query–candidate pair from the KAT2A target in DUD-E. Figure 4A visualizes attention weights between atom pairs: instead of spreading across the entire matrix, high weights concentrate in a few regions, suggesting the model tends to lock onto key fragments during matching.

Figure 4B illustrates a structural alignment sketch: the highest-weight atom correspondences from Figure 4A are mapped back onto the molecular structures and annotated with links, where darker/more solid lines indicate higher weights. Different atom colors follow the standard molecular visualization convention and denote different element types. The links cluster around a few crucial local regions rather than scattering randomly. This pattern indicates that cross-graph attention highlights a small number of locally focused correspondences, which can be qualitatively inspected to understand which fragments contribute most to the matching score in this example.

2.7. Summary

In summary, this chapter provides a systematic evaluation of GNN-MA from five aspects: overall performance, early enrichment, cross-target robustness, inference efficiency, and qualitative alignment visualization. Taken together, the results indicate that GNN-MA combines solid discriminative performance with stronger very-early retrieval behavior, while the per-target and efficiency analyses further clarify where these advantages are most evident and how the model may be used in practical re-ranking settings. Meanwhile, per-target analysis and the visualization case study provide qualitative support for the model’s learned matching behavior. Throughput benchmarking also suggests practical deployability, and indicates that caching and batching can markedly improve end-to-end screening efficiency.

3. Materials and Methods

3.1. Study Overview and Problem Definition

We formulated ligand-based virtual screening (LBVS) as a query–candidate pairwise scoring task. Given a query molecule $[eqn]$ and a candidate molecule $[eqn]$ under the same target, we learned a differentiable scoring function $[eqn]$ so that active compounds in the candidate library receive higher scores and are ranked earlier.

Training data were constructed in pairs: ligand–active pairs were treated as positives, and ligand–decoy/inactive pairs as negatives. The model output a continuous matching score to rank candidates, and we reported ROC-AUC and early-enrichment metrics (EF@k%) during testing.

3.2. Datasets and Preprocessing

3.2.1. Datasets and Molecular Representation

We evaluated our method on two standard LBVS benchmarks: DUD-E [14] and LIT-PCBA [16]. Both datasets are organized by target, and each target contains active molecules and inactive molecules. DUD-E originally included 102 targets; after removing 2 targets that failed parsing, we retained 100 targets. LIT-PCBA contains 15 targets, and all were used in this study.

Molecules were represented as 2D heavy-atom graphs derived from SMILES. We did not impose a hard limit on the number of atoms; variable-sized molecules were handled via in-batch padding and masking.

3.2.2. Data Split and Evaluation Protocol

To ensure fairness, reproducibility, and leakage-free evaluation, we adopted a unified molecule-level splitting and evaluation protocol.

(1)Molecule-level split. For each target t, molecules were split into train/validation/test subsets with a ratio of 8:1:1 at the molecule level. The split satisfies disjointness:

[eqn]

[eqn]

A fixed random seed was used and the same split was reused across all experiments. All models in this work were trained and evaluated under an identical partition.

(2)Pair construction. Within each target, ligand–active pairs form positive samples and ligand–decoy/inactive pairs form negative samples. Pair construction strictly respected subset boundaries, preventing pair-level leakage. During evaluation, all molecules used for scoring were taken exclusively from the test split of each target. Pairwise scores were computed only between molecules within the same target-specific test subset, and the resulting scores were ranked to compute ROC-AUC and EF@k%.(3)Evaluation views. During testing, ranking was performed independently within each target. We report: (i) pooled metrics that aggregate per-target statistics; and (ii) per-target metrics and robustness analysis. Formal metric definitions are provided in Section 3.6.

3.3. Molecular Graph Representation and Features

We represented a molecule as a graph $[eqn]$ , where $[eqn]$ is the atom feature matrix (with $[eqn]$ as the node feature dimension), and $[eqn]$ is the bond feature tensor (with $[eqn]$ as the edge feature dimension). To reflect the incomplete and heterogeneous inputs in real-world LBVS, we adopted a dimension-agnostic feature organization strategy: we used 2D topology and atom/bond attributes as the input features throughout all experiments. The specific atom and bond features we used are listed in Table 7 and Table 8.

3.4. GNN-MA Model Architecture

As illustrated in Figure 5, GNN-MA comprises four components: (i) Intra-Graph Message Passing, (ii) Cross-Graph Attention for Soft Alignment, (iii) Edge Fusion and Bond-to-Atom Aggregation, and (iv) graph-level pooling followed by an MLP to output the final pairwise matching score.

3.4.1. Intra-Molecular Message Passing

To learn intra-molecular structural semantics, GNN-MA performs $[eqn]$ -th layer message passing on the query graph $[eqn]$ and the candidate graph $[eqn]$ separately. Given the ( $[eqn]$ -1)th layer node embedding $[eqn]$ and the edge (bond) representation $[eqn]$ between nodes $[eqn]$ and $[eqn]$ , node $[eqn]$ aggregates messages from its neighborhood $[eqn]$ as follows:

[eqn]

The node embeddings are then updated via the following function:

[eqn]

Here, $[eqn]$ and $[eqn]$ are learnable mappings (e.g., multilayer perceptrons) that integrate neighborhood atom and bond information into the node representation. To improve training stability and representation capacity, we further apply residual connections and normalization between layers.

3.4.2. Cross-Graph Attention for Soft Alignment

Independent encoding of each molecule is typically insufficient to extract the pairwise matching cues required for retrieval. Therefore, GNN-MA builds upon the intra-molecular representations with a cross-graph attention module that explicitly captures atom- and bond-level interactions between the query and candidate, producing an explicit soft-alignment matrix.

Let $[eqn]$ and $[eqn]$ , denoting the atom representations of the two compounds after intra-molecular message passing, where $[eqn]$ and $[eqn]$ are the numbers of atoms and $[eqn]$ is the embedding dimension. We use scaled dot-product attention to compute the cross-graph relevance between the $[eqn]$ -th atom in the query molecule $[eqn]$ and the $[eqn]$ atom in the candidate molecule $[eqn]$ :

[eqn]

Here, $[eqn]$ and $[eqn]$ are learnable parameters, and $[eqn]$ is the scaling factor used in the attention mechanism.

For each query atom $[eqn]$ , we normalize across all candidate atoms $[eqn]$ by applying a softmax along the candidate atom dimension, yielding the soft alignment weights from query to candidate:

[eqn]

Based on these weights, query atom $[eqn]$ aggregates information from the candidate molecule to obtain a cross-graph contextual representation:

[eqn]

To enhance the symmetry of alignment and the complementarity of information, we similarly compute the reverse attention $[eqn]$ and obtained $[eqn]$ .

Analogously, we also construct cross-graph attention at the bond level: the correlation computation, normalization, and cross-graph aggregation follow the same procedure as the atom-level attention, except that the inputs are changed from atom representations to bond representations.

3.4.3. Edge Fusion and Bond-to-Atom Aggregation

The atom-level interaction representations produced by cross-graph attention provide evidence of soft correspondences between the query and candidate. Building on this, GNN-MA introduces a two-stage structural enhancement prior to node updating, namely edge-level fusion → edge-to-node aggregation: we first fuse information at the bond (edge) level to obtain enhanced edge representations, and then aggregate these enhanced edge features to adjacent atoms. In this way, bond semantics are injected into atom representations and subsequently contribute to the final scoring.

(1)Edge-level Fusion:

For each chemical bond, we construct a fused edge representation by combining the cross-graph-updated representations of its two incident atoms with the original bond feature:

[eqn]

Here, $[eqn]$ is a learnable mapping that integrates node semantics and bond information to produce an enhanced bond representation.

(2)Edge-to-node aggregation

After obtaining the fused edge representations, we aggregate information from incident edges for each atom to form an edge-aggregation vector:

[eqn]

Then, we inject the aggregated bond information into the atom representation through a learnable transformation and an update operation:

[eqn]

Here, $[eqn]$ is a learnable mapping that transfers bond-level information to atom-level representations, facilitating subsequent fusion with atom features.

This design is particularly relevant to LBVS, where subtle local bond environments can affect functional similarity even when global graph topology appears similar.

3.4.4. Similarity Scoring and Ranking

For graph-level pooling, we fuse the molecular representations obtained from intra-graph convolution, cross-graph attention, and edge-to-node aggregation via a residual combination, and then apply a readout function to obtain graph-level embeddings for scoring:

[eqn]

For pairwise scoring, we combine the two graph-level embeddings and feed them into a multilayer perceptron to obtain the final matching score:

[eqn]

Here, || denotes feature concatenation.

First, the model outputs a matching score for each query–candidate molecular pair. During training, we used the binary cross-entropy loss as the primary supervision signal for active/decoy classification. To further emphasize early enrichment, we added a within-target batch-wise ranking constraint. Specifically, each mini-batch was constructed from molecules belonging to the same target. Within the current mini-batch, negative pairs are ranked according to their predicted similarity scores, and the top K highest-scoring negatives are selected as the hardest negatives (K = 10). The ranking term then encourages the scores of positive pairs to be higher than those of these selected hard negatives. In this work, the final objective is defined as:

[eqn]

where L_bce_ denotes the binary cross-entropy loss and L_rank_ denotes the batch-wise pairwise ranking loss. To stabilize training, the ranking term is activated only after a short warm-up period: λ = 0 for the first two epochs and λ = 0.05 for the remaining epochs.

Compared with a pure pairwise classification objective, this ranking-oriented design better reflects the practical goal of LBVS, namely prioritizing active candidates at the top of the ranked list.

3.5. Baselines and Ablation Settings

To assess how a general 2D molecular graph classifier performs at this task, we chose DeepChem’s GraphConvModel [46,54], GMN [52], Siamese_GNN [53], and ECFP4 [25] as representative comparison methods. GraphConvModel serves as a general molecular graph classification baseline, learning structural representations directly from 2D molecular graphs and producing a binary classification score. GMN and Siamese_GNN provide pairwise graph-matching and similarity-learning baselines, respectively, while ECFP4 serves as a classical fingerprint-based method. On both DUD-E and LIT-PCBA, all baselines were evaluated using the same data split and evaluation pipeline as our main model, and we report both pooled and per-target AUC and EF@k%. We also built an ablated variant, GNN-MA-intra. It matches GNN-MA exactly in data splits, loss functions, and hyperparameter settings, but removes the cross-graph attention module, ensuring a fair and reproducible comparison.

3.6. Evaluation Metrics and Statistical Reporting

3.6.1. ROC-AUC

Overall ranking quality was evaluated using the area under the receiver operating characteristic curve [51] (ROC-AUC). ROC-AUC measures the probability that a randomly selected active molecule receives a higher predicted score than a randomly selected inactive molecule. It is threshold-independent and reflects global discriminative ability.

3.6.2. Early Enrichment: EF@k%

To assess early retrieval performance under realistic screening constraints, we report the enrichment factor EF@k% [55] for k = 1, 2, 5, 10, and 20. For target $[eqn]$ , let $[eqn]$ denote the number of test molecules and $[eqn]$ the number of test actives. After ranking molecules by predicted score in descending order, the top-k% cutoff is defined as:

[eqn]

Let $[eqn]$ denote the number of actives within the top $[eqn]$ molecules. The target-wise enrichment factor is defined as:

[eqn]

We report $[eqn]$ with particular emphasis on EF@1%, which reflects very early enrichment.

3.6.3. Aggregation Views Across Targets

Because targets can differ substantially in candidate-set size and active ratio, we report four complementary aggregation views:

(1)Global average (pooled)

Predictions from all targets are pooled together before computing the metric. This view provides a single overall summary and aligns with pooled reporting conventions in many prior works. However, it can be dominated by a few large targets.

(2)Macro-average

Metrics are computed independently for each target and then averaged with equal weights across targets. This view reflects cross-target robustness by treating each target equally.

(3)Weighted macro-average

Per-target metrics are averaged with weights proportional to the number of evaluated molecules (or candidates) per target. This view lies between macro-average and global average and helps assess whether aggregate results are driven by a target-size imbalance.

(4)Macro-target statistics

Metrics are analyzed at the target level to assess consistency across targets. We summarize per-target performance using distributions, win/loss/tie counts, and paired statistical tests on per-target differences.

3.6.4. Per-Target Improvement and Significance Testing

For a given metric M (ROC-AUC or EF@1%), we defined the per-target improvement over a baseline as

[eqn]

We estimated uncertainty of the mean improvement using a nonparametric bootstrap over targets (B = 20,000 resamples) and report percentile-based 95% confidence intervals.

To test whether improvements are systematically positive across targets, we applied a two-sided Wilcoxon signed-rank test to the set of per-target differences { $[eqn]$ }. We additionally report win/loss/tie counts, where a win indicates $[eqn]$ > 0, a loss indicates $[eqn]$ < 0, and a tie indicates $[eqn]$ = 0.

In the main text, EF tables are reported under macro-average and weighted macro-average views, whereas global average metrics are included as supplementary results.

3.7. Implementation Details and Throughput Benchmark

We implemented the model in PyTorch 2.6.0+cu126 and used Adam optimizer [56] for optimization. Key hyperparameters were: batch size = 32; epochs = 20; learning rate = 1 × 10^3^; weight decay = 1 × 10^−4^; dropout = 0.2; warm-up = 2; model size: hidden dim = 64; message passing layers = 3; gradient clipping max-norm = 5.0.

Hardware and software environment: OS = Windows 10 (10.0.26100); Python 3.11.9; PyTorch 2.6.0+cu126; CUDA 12.6; GPU = NVIDIA GeForce RTX 4060 Ti; CPU = Intel Core i5-14600K; RAM = 32 GB.

For the ranking loss, we set K = 10 hardest negatives per batch and λ = 0.05 after a two-epoch warm-up period. These values were kept fixed across all experiments.

4. Conclusions

We propose GNN-MA, a soft molecular alignment approach for ligand-based virtual screening (LBVS). Using molecular graphs as a unified representation, GNN-MA learns atom-level correspondences via cross-graph attention and, together with intra-graph message passing and bond-to-atom aggregation, explicitly injects cross-molecule matching evidence into similarity scoring and provides a qualitative interpretability cue in case studies.

Experiments on DUD-E and LIT-PCBA support the effectiveness of the proposed alignment-aware design: cross-graph interaction improves retrieval quality relative to the intra-graph variant, particularly in the early-ranking regime, while the accompanying per-target and throughput analyses suggest that the model is best suited for shortlist refinement rather than brute-force large-library screening. Throughput benchmarking further suggests strong deployment potential, showing that caching and batching can substantially improve end-to-end screening efficiency.

Future work will focus on more robust early-ranking objectives, improved negative sampling strategies, and validation on additional external benchmarks.

Bibliography56

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Tao C. Wang Y. Li X. Liu M. Advancements in Ligand-Based Virtual Screening through the Synergistic Integration of Graph Neural Networks and Expert-Crafted Descriptors J. Chem. Inf. Model.202565489849054036598510.1021/acs.jcim.5c 00822 PMC 12117557 · doi ↗ · pubmed ↗
2Sciabola S. Torella R. Nagata A. Boehm M. Critical Assessment of State-of-the-Art Ligand-Based Virtual Screening Methods Mol. Inform.202241 e 220010310.1002/minf.20220010335871608 · doi ↗ · pubmed ↗
3Du Y. Xing L. Zhang J. Chen Y. Graph Neural Networks in Modern AI-Aided Drug Discovery Chem. Rev.2025125100011010310.1021/acs.chemrev.5c 0046140959983 · doi ↗ · pubmed ↗
4Lim J.H. Sankararaman S. Iliopoulos-Tsoutsouvas C. Ernst M. Buelens F. Bissantz C. De Fabritiis G. Dominy B. Elliott P. Jiang D. Modeling protein-ligand interactions for drug discovery in the era of deep learning Chem. Soc. Rev.20255411141111834111701510.1039/d 5cs 00415 b · doi ↗ · pubmed ↗
5Thaingtamtanha T. Ravichandran R. Gentile F. On the application of artificial intelligence in virtual screening Expert Opin. Drug Discov.20252084585710.1080/17460441.2025.250886640388244 · doi ↗ · pubmed ↗
6Zhang X. Zhang H. Zhang Y. Wu H. Li J. Wang X. A review of deep learning methods for ligand based drug virtual screening Fundam. Res.2024471573710.1016/j.fmre.2024.02.01139156568 PMC 11330120 · doi ↗ · pubmed ↗
7Zhou G. Rusnac D.-V. Park H. Canzani D. Nguyen H.M. Stewart L. Bush M.F. Nguyen P.T. Wulff H. Yarov-Yarovoy V. An artificial intelligence accelerated virtual screening platform for drug discovery Nat. Commun.202415776110.1038/s 41467-024-52061-739237523 PMC 11377542 · doi ↗ · pubmed ↗
8Lyu J. Wang S. Balius T.E. Singh I. Levit A. Moroz Y.S. O’Donnell T.J. Tao P. Irwin J.J. Shoichet B.K. Modeling the expansion of virtual screening libraries Nat. Chem. Biol.20231971271810.1038/s 41589-022-01234-w 36646956 PMC 10243288 · doi ↗ · pubmed ↗