SuperMetal: a generative AI framework for rapid and precise metal ion location prediction in proteins

Xiaobo Lin; Zhaoqian Su; Yunchao Lance Liu; Jingxian Liu; Xiaohan Kuang; Peter T. Cummings; Jesse Spencer-Smith; Jens Meiler

PMC · DOI:10.1186/s13321-025-01038-9·July 15, 2025

SuperMetal: a generative AI framework for rapid and precise metal ion location prediction in proteins

Xiaobo Lin, Zhaoqian Su, Yunchao Lance Liu, Jingxian Liu, Xiaohan Kuang, Peter T. Cummings, Jesse Spencer-Smith, Jens Meiler

PDF

Open Access

TL;DR

SuperMetal is a new AI tool that quickly and accurately predicts where metal ions bind in proteins, helping with drug discovery and protein engineering.

Contribution

SuperMetal introduces a diffusion-based, SE(3)-equivariant generative model for precise and rapid metal ion prediction in proteins.

Findings

01

SuperMetal achieves 94% precision and 90% coverage in predicting zinc ion binding sites.

02

It localizes metal ions within 0.52 ± 0.55 Å of experimental positions.

03

The model operates in under 10 seconds for proteins with ~2000 residues.

Abstract

Metal ions, as abundant and vital cofactors in numerous proteins, are crucial for enzymatic activities and protein interactions. Given their pivotal role and catalytic efficiency, accurately and efficiently identifying metal-binding sites is fundamental to elucidating their biological functions and has significant implications for protein engineering and drug discovery. To address this challenge, we present SuperMetal, a generative AI framework that leverages a score-based diffusion model coupled with a confidence model to predict metal-binding sites in proteins with high precision and efficiency. Using zinc ions as an example, SuperMetal outperforms existing state-of-the-art models, achieving a precision of 94 % and coverage of 90 %, with zinc ions localization within 0.52 ± 0.55 Å of experimentally determined positions, thus marking a substantial advance in metal-binding site…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Chemicals3

zinc ions

zinc Metal

Figures6

Click any figure to enlarge with its caption.

Workflow of supermetal. The orange spheres represent the sampled Zn ions. The protein, shown in blue, is from 2 J9R in the RCSB Protein Data Bank

Precision versus coverage for SuperMetal and Metal3D at different probability cutoffs.The graph compares SuperMetal (purple line) and Metal3D (green line), illustrating how precision varies with coverage at different cutoff values. The labels indicate the probability cutoffs, *p* for the confidence model in SuperMetal and *t* for Metal3D

Distribution of mean absolute deviation (MAD) for SuperMetal and Metal3D at increasing probability cutoffs. This violin plot illustrates the MAD for metal ion location predictions by SuperMetal and Metal3D, with cutoffs ranging from lower to higher values for each model. SuperMetal is shown in purple and Metal3D in green. The plot employs kernel density estimation to display the data distribution, with the white circle indicating the median, the black box defining the first and third quartiles, and whiskers extending up to 1.5 times the interquartile range to capture the spread of typical data

Computational time analysis for SuperMetal and Metal3D across protein sizes. This scatter plot compares the computational time required by SuperMetal and Metal3D to predict metal-binding sites against the number of protein residues. Polynomial regression curves (purple and green dashed lines) are only used to clarify the trends

Comparative visualization of Zn ion binding site predictions for the proteins 5IN2 (1) and 6BTP (2). Zn ions are color-coded as follows: grey for experimentally determined Zn ions, cyan for Metal3D predictions, orange for SuperMetal predictions, and blue for AlphaFold 3 predictions. The protein structure is shown in green for the same input PDB file used in Metal3D and SuperMetal, while it is shown in yellow for AlphaFold 3. The transparent green region around the Zn ions highlights the protein’s atomic structure within a 5 Å radius of the metal ions. From left to right, the figure shows varyi

Fundamental Theory of the Score-based Generative Diffusion Model for Metal Ions in Proteins. The forward continuous-time stochastic differential equation (SDE) transitions the true locations of metal ions (top left) to random locations (top right). The score at each intermediate time step is predicted by a deep learning neural network, enabling the reverse process of the SDE. The grey part of the protein (top) represents the atomic structure of the protein surrounding the metal ions at their original positions

Funding3

—Vanderbilt Data Science Institute
—John R. Hall Professorship Endowment
—Humboldt Professorship of the Alexander von Humboldt Foundation

Keywords

Generative AIDiffusion modelMetal-binding sitesMetalloprotein

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRNA and protein synthesis mechanisms · Protein Structure and Dynamics · Machine Learning in Bioinformatics

Full text

Introduction

The Protein Data Bank (PDB) contains nearly 200,000 structures, and approximately one-third of these proteins contain metal ions [1]. Many proteins require the binding of one or more metal ions to perform their functions. Zinc, a vital biologically active metal, is particularly noteworthy as it binds to approximately 10% of all human proteins [2]. These proteins rely on zinc for their biological function, structural stability, or regulation of activities. For example, zinc is indispensable for the activity of more than 300 enzymes, spanning all six enzyme classes [3]. The binding of zinc also stabilizes the folded conformations of protein domains, ensuring proper function [4]. Additionally, zinc ions play a crucial role in cell-cell communication, cell signaling, proliferation, survival, and DNA repair by modulating zinc-binding proteins [5]. In mammals, zinc homeostasis is primarily maintained by zinc transporter proteins, and zinc deficiency may lead to various health issues, such as smell and taste disorders, immune system disorders, developmental delays, and impaired immune function [6].

Given the importance and unique functionality of zinc in proteins, accurately identifying zinc-binding sites is crucial. However, direct determination of these sites through wet experiments is often costly and time-consuming due to technical difficulties, requiring expensive instruments, complex procedures, and elaborate labor. In contrast, computational approaches for discovering zinc-binding sites can significantly reduce the cost and time associated with corresponding wet experiments. Consequently, many computational methods have been developed to predict zinc-binding sites, which can be broadly categorized into four groups based on the attributes they consider: (1) template-based methods that leverage structural or sequence homologs and binding templates, e.g., MIB and MIB2 [7, 8]; (2) sequence-based methods that rely on amino acid sequence, e.g., M-Ionic [9]; (3) structure-based methods that use structural geometry and chemical features to identify metal-binding sites, e.g., Metal1D, Metal3D, and BioMetAll [10, 11]; and (4) physics-based methods, such as quantum mechanical/molecular mechanical (QM/MM) simulations [12, 13]. However, template-based methods are limited by the availability of existing templates or patterns and may struggle to accurately predict novel binding sites, while sequence-based methods lack the ability to offer atomic-level insights or detailed descriptions of protein-metal interactions. Physically modeling the dynamics of metal ion binding sites is also challenging, especially for sites located in the protein interior [14, 15]. Moreover, physical methods like molecular dynamics simulations struggle to find appropriate force fields for transition metals to reproduce correct coordination distances [11, 14, 16], while QM/MM simulations are too computationally expensive to be practical for typical protein design tasks. [12, 17, 18].

Current state of the art predictors for metal location is Metal3D [11], a structure-based method that employs 3D Convolutional Neural Networks (CNNs) to predict the positions of metal ions, such as zinc. Metal3D operates by taking a protein structure along with a specified set of residues as input, voxelizing the environment around each residue, and then predicting the per-residue metal density. Trained on experimental zinc sites, Metal3D generates a probability density map for metal ions, offering high precision with predictions within sub-angstrom precision. Despite its success, Metal3D faces challenges similar to other 3D CNN models [19–22], such as the need for fine grid spacing (voxelization). The computational cost for these voxel-based models increases cubically with the resolution of the input, making scaling up difficult [23]. Moreover, these CNN-based models are sensitive to the orientation of the input structure, requiring data augmentation to increase the number of training samples and reduce the risk of overfitting [11, 24]. Although Metal3D has demonstrated superior accuracy compared to other methods, issues like rotational invariance remain, highlighting the need for further advancements in metal ion prediction models [25, 26].

In recent years, diffusion models have emerged as a powerful generative AI technology with applications spanning natural language processing, image synthesis, and bioinformatics [27–29]. These diffusion models have significantly advanced fields such as computational protein design [30], drug and small-molecule development [31], protein–ligand interaction modeling [28, 29], cryo-electron microscopy data enhancement [32], and single-cell data analysis [33, 34]. In contrast, other generative approaches—such as variational auto-encoders (VAEs)—often require strict architectural constraints to maintain a tractable normalizing constant or must rely on surrogate objectives for approximate maximum likelihood training [35]. Meanwhile, generative adversarial networks (GANs) typically use adversarial training, which is notoriously unstable and prone to mode collapse [36]. By directly modeling the gradient of the log probability density, score-based diffusion models avoid many of these pitfalls, offering a more stable way to represent probability distributions.

In this paper, we introduce SuperMetal, a novel generative AI approach that combines a score-based diffusion model with equivariant graph neural networks to accurately predict zinc ion positions within protein structures. Instead of directly approximating the probability distribution of zinc ions, our model estimates the gradient of this distribution and then uses it to generate zinc positions from a normal distribution. These generated positions are then refined through a confidence model and clustered, resulting in a precise prediction of both the number and exact locations of zinc ions for a given protein structure. SuperMetal surpasses existing methods, achieving state-of-the-art results in the coverage of experimental zinc ions and the precision of predicted positions. This approach offers significant potential for applications in structural studies, binding site predictions, multi-body docking, and metalloprotein engineering.

Results and discussion

In this section, we present the evaluation and comparative results of SuperMetal against the state-of-the-art Metal3D in predicting zinc ion positions within protein structures. Both methods utilize the same test dataset to ensure a fair comparison. During the inference step of SuperMetal, 100 metal ions are sampled at random positions across the system and denoised via reverse diffusion over their translational degrees of freedom. A confidence model and clustering mechanism are subsequently applied to determine the final zinc ion binding sites for different proteins.

Overview of SuperMetal

The SuperMetal pipeline consists of three primary stages. The process begins with the preprocessing of protein structures into heterogeneous geometric graphs. These graphs consider node and edge features for protein residues, protein atoms, and metal ions, while also accounting for diffusion time at different timesteps [37, 38]. These geometric graphs are then fed into a score-based diffusion model to sample potential zinc ion positions. Following the sampling, an equivariant graph neural network is utilized to predict confidence scores for each candidate metal position. Positions with confidence scores below a specified threshold are subsequently filtered out. In the final stage, a clustering algorithm [39] is employed to calculate the centroids of neighboring metal positions, resulting in the placement of one metal ion per cluster.Fig. 1. Workflow of supermetal. The orange spheres represent the sampled Zn ions. The protein, shown in blue, is from 2 J9R in the RCSB Protein Data Bank

The detailed methodology for each of these steps is provided in Section “Methods” and the supplemental information, with a visual representation of the entire pipeline illustrated in Fig. 1. The key contributions of SuperMetal include:

A score-based diffusion model that processes geometric graphs of protein structures, enabling the sampling of metal positions within proteins.
An equivariant graph neural network that accurately evaluates each sampled point and filters out low-confidence positions.
Postprocessing operations designed to optimize prediction accuracy by clustering the predicted positions.

Comparison between SuperMetal and Metal3D

In the field of ligand-binding site prediction and water site prediction, 3D-CNN-based deep-learning models have demonstrated superiority over knowledge-based, sequence-based, physics-based, and traditional machine-learning methods [19, 20, 40]. Metal3D [11] has been reported to achieve state-of-the-art performance in predicting zinc ion binding locations. The tool was benchmarked against other existing methods, including Metal1D [11], BioMetAll [10], MIB [8], and MIB2 [41]. Metal3D operates by taking a protein structure along with a specified set of residues as input, voxelizing the environment around each residue, and then predicting the per-residue metal density. The predicted densities for each residue, within a 16 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}$ 16 $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}$ 16 Å^3^ volume, are averaged to compute the overall zinc density across the protein. Clustering of voxels exceeding a defined probability threshold is then performed based on the global probability density. For each resulting cluster, a weighted average of the voxel positions is computed, using the probability at each point as the weight, ultimately placing one metal ion per cluster [11].

In contrast, SuperMetal employs a fundamentally different approach to predict metal ion binding locations. We conceptualize metal binding as a generative modeling problem–learning a distribution over metal translations given multiple known metal ion locations and the target protein structure. To address this, we developed SuperMetal, a diffusion generative model (DGM) that operates in the translational space of metal locations for metal binding prediction. This model defines a diffusion process over the translational degrees of freedom, specifically the positions of various metal ions relative to the protein. SuperMetal generates metal positions by running a learned reverse diffusion process, which iteratively transforms an initial, noisy prior distribution over metal positions into a refined model distribution (Fig. S2). Conceptually, this can be seen as the progressive refinement of random initial positions through successive updates of their translations. Additionally, we train a confidence model to estimate the likelihood of the metal positions generated by the DGM, allowing us to select the most probable samples. The final predicted metal locations are determined by calculating the cluster centers. The last two-step process significantly enhances prediction accuracy while maintaining a minimal impact on computational runtime.

It is important to note that SuperMetal operates in a continuous space, eliminating the need to voxelize protein environments, as is necessary with Metal3D. In Metal3D, the voxel grid size and resolution can significantly influence prediction accuracy and computational cost. Metal3D also requires rotational augmentation of the residue environment during voxelization to enhance data generalization and the training of machine learning models. In contrast, SuperMetal leverages the full 3D structure, allowing the scoring model to reason about physical interactions using an SE(3)-equivariant framework without the need for data augmentation [42], which enhances its ability to generalize to previously unseen complexes.

To assess the model’s performance, we calculated the precision versus coverage curve for SuperMetal and Metal3D (see Section “Evaluation and comparison” for details). Predictive models typically exhibit a trade-off between precision and coverage [43], which is consistent with our observations in Fig. 2. The curve in this figure is generated using different probability thresholds for both SuperMetal and Metal3D. SuperMetal demonstrates higher precision across a wider range of coverage compared to Metal3D. For example, when Metal3D achieves 100% precision, its coverage is around 30%, whereas SuperMetal reaches approximately 70% coverage at the same level of precision—more than double the coverage of Metal3D. Similarly, at 77% coverage, SuperMetal maintains near 100% precision, while Metal3D’s precision drops to around 93%. Moreover, at 88% coverage, Metal3D’s precision is $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim$$\end{document}$ 84%, whereas SuperMetal achieves $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim$$\end{document}$ 95%, marking a significant improvement. These results clearly indicate that SuperMetal outperforms Metal3D, providing higher precision even at greater coverage. This demonstrates SuperMetal’s ability to maintain high precision while significantly expanding the scope of its predictions.Fig. 2. Precision versus coverage for SuperMetal and Metal3D at different probability cutoffs.The graph compares SuperMetal (purple line) and Metal3D (green line), illustrating how precision varies with coverage at different cutoff values. The labels indicate the probability cutoffs, p for the confidence model in SuperMetal and t for Metal3D

In addition to precision and coverage in metal site prediction, it is crucial to assess the spatial precision of these predictions. To evaluate this, we measured the mean absolute deviation (MAD) between the experimentally determined zinc locations and their corresponding correctly predicted positions (true positives), as shown in Fig. 3. At a probability threshold of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p = 0.1$$\end{document}$ , SuperMetal achieves a MAD of 0.61 ± 0.66 Å, which improves to 0.44 ± 0.58 Å as the threshold increases to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p = 0.9$$\end{document}$ . This trend demonstrates that higher probability cutoffs lead to greater spatial precision, with the median MAD decreasing from 0.37 Å at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p = 0.1$$\end{document}$ to 0.23 Å at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p = 0.999$$\end{document}$ . The relatively small difference between these two median values suggests that even low-confidence predictions are spatially accurate within the protein structure. In contrast, Metal3D exhibits a clearly increasing median MAD, rising from 0.36 Å at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 0.7$$\end{document}$ to 0.87 Å at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 0.99$$\end{document}$ , indicating a greater deviation from ground-truth positions as the probability cutoff increases. Additionally, the spread of MAD values in SuperMetal decreases with higher probability cutoffs, opposite to the trend observed in Metal3D, where the spread increases.

These observations from the precision vs. coverage and MAD analyses suggest that SuperMetal not only demonstrates strong spatial precision across varying probability cutoffs, but also reveals that its improved precision in metal site prediction is closely tied to enhanced spatial precision, as indicated by a smaller MAD. In other words, as SuperMetal’s confidence in its predictions increases, so does its precision in pinpointing the exact locations of metal-binding sites within protein structures. This level of precision is particularly critical in applications where accurate metal ion positioning is essential, further highlighting the robustness and reliability of SuperMetal’s predictive capabilities.Fig. 3. Distribution of mean absolute deviation (MAD) for SuperMetal and Metal3D at increasing probability cutoffs. This violin plot illustrates the MAD for metal ion location predictions by SuperMetal and Metal3D, with cutoffs ranging from lower to higher values for each model. SuperMetal is shown in purple and Metal3D in green. The plot employs kernel density estimation to display the data distribution, with the white circle indicating the median, the black box defining the first and third quartiles, and whiskers extending up to 1.5 times the interquartile range to capture the spread of typical data values

SuperMetal not only outperforms Metal3D in terms of prediction accuracy but also demonstrates a significant advantage in processing speed. The computational runtime comparison as a function of protein size (number of residues) for SuperMetal and Metal3D is shown in Fig. 4. For consistent comparison, both models were executed using a single thread on one CPU core, with the same GPU. We observed that Metal3D’s runtime tends to increase exponentially as the protein size grows, whereas SuperMetal maintains consistently low runtimes (under 10 s), even for larger proteins. For example, when the protein size approaches 2000 residues, Metal3D requires approximately 500 s, which is around 60 times longer than SuperMetal.

This impressive performance improvement in SuperMetal can be attributed to the multi-scale approach that SuperMetal employs. Unlike Metal3D, SuperMetal efficiently manages computational complexity through a hierarchical interaction system. For instance, when metal ions are distant from protein residues, only coarse-grained interactions are considered; when metal ions are closer to the residues, the atomic structure of the residues is taken into account. This approach ensures that SuperMetal avoids constructing overly large and computationally expensive graphs, particularly between the metal and protein or within the protein itself. By limiting the scope of interaction calculations to only those that are most relevant and proximal, SuperMetal significantly enhances its computational efficiency. In contrast, Metal3D employs a different approach, utilizing voxelization and grid averaging for proteins. While this method is effective in certain contexts, it results in computation times that scale more directly with the number of residues. Consequently, as protein size increases, Metal3D experiences a much more rapid rise in runtime, leading to significantly longer processing times compared to SuperMetal, particularly as protein complexity increases.Fig. 4. Computational time analysis for SuperMetal and Metal3D across protein sizes. This scatter plot compares the computational time required by SuperMetal and Metal3D to predict metal-binding sites against the number of protein residues. Polynomial regression curves (purple and green dashed lines) are only used to clarify the trends

Case study

The recently published AlphaFold 3 [44] has garnered significant attention due to its ability to predict interactions within joint structures of complexes, such as protein-ligand and protein-ion interactions. In this case study, we compare our model, SuperMetal, against AlphaFold 3 and Metal3D, focusing on zinc-binding site prediction. For the evaluation, we selected two proteins with distinct zinc-binding sites: 5IN2, which represents the crystal structure of the extracellular Cu/Zn superoxide dismutase enzyme from Onchocerca volvulus [45], and 6BTP, the crystal structure of bone morphogenetic protein 1 (BMP1) complexed with a hydroxamate inhibitor [46].

One challenge we encountered with AlphaFold 3 is the need to specify the exact number of zinc ions for binding site prediction. In real-world scenarios, the number of ion-binding sites is often unknown, making this requirement impractical. To facilitate comparisons, we specify the number of input zinc ions for AlphaFold 3, using one, two, and six zinc ions for each protein (from left to right, as shown in Fig. 5). Another significant limitation of AlphaFold 3 is that its source code is not publicly available. This lack of access restricts users from modifying or customizing the tool to meet specific needs or integrate it with other workflows. Moreover, AlphaFold 3’s server only accepts sequence inputs, not direct PDB structures, which can be problematic when the protein structure is already known, as using the structure would be preferable in such cases. However, for the two protein structures used in the case study, we fortunately found that the structures predicted by AlphaFold 3 from sequences were quite similar to the experimentally determined PDB structures. This allowed us to use AlphaFold 3’s results for comparison with Metal3D and our model (see Fig. 5).

In the case of 5IN2, when we specified the experimentally determined number of zinc ions (i.e., 2), AlphaFold 3 accurately predicted the zinc-binding locations with 100% precision and 100% coverage (Fig. 5a, Table 1). However, this approach relies on prior knowledge of the exact number of zinc ions, which is not always feasible in real-world applications. For instance, when AlphaFold 3 was provided with only one zinc ion, the precision remained high (100%), but the coverage dropped significantly to 50%. Conversely, inputting six zinc ions into AlphaFold 3 led to multiple incorrect zinc-binding predictions (Fig. 5, Table 1). However, even when the correct number of zinc ions was specified, AlphaFold 3’s predictions were not always reliable: in the case of 6BTP, despite inputting the experimental determined number of zinc ions, both precision and coverage reach only 50%. In contrast, SuperMetal consistently achieved 100% precision and coverage in this scenario, demonstrating its robustness in handling such predictions.

Our results also show that SuperMetal outperformed Metal3D in both precision and coverage across the two case studies, reinforcing the findings from the previous section.Fig. 5. Comparative visualization of Zn ion binding site predictions for the proteins 5IN2 (1) and 6BTP (2). Zn ions are color-coded as follows: grey for experimentally determined Zn ions, cyan for Metal3D predictions, orange for SuperMetal predictions, and blue for AlphaFold 3 predictions. The protein structure is shown in green for the same input PDB file used in Metal3D and SuperMetal, while it is shown in yellow for AlphaFold 3. The transparent green region around the Zn ions highlights the protein’s atomic structure within a 5 Å radius of the metal ions. From left to right, the figure shows varying numbers of Zn ions specified in the AlphaFold 3 input, ranging from 1 Zn ion, to 2, and finally to 6 Zn ions

Table 1. Comparison of precision and coverage in predicting zinc ion binding sites using Metal3D, SuperMetal, and AlphaFold 35IN26BTP ModelPrecision (%)Coverage (%)Precision (%)Coverage (%)Metal3D335010050SuperMetal100100100100AlphaFold 3 (1 Zn ion)1005010050AlphaFold 3 (2 Zn ions)1001005050AlphaFold 3 (6 Zn ions)331001750

Conclusions

In this work, we introduced SuperMetal, a novel generative AI pipeline for predicting metal ion positions within protein structures. Our method leverages a score-based diffusion model to sample potential metal positions, followed by an equivariant graph neural network as a confidence model to evaluate and refine these predictions. The effectiveness of SuperMetal was tested against the state-of-the-art Metal3D model using a comprehensive dataset of zinc-binding sites.

SuperMetal demonstrated superior performance across all key metrics, including precision, coverage, and mean absolute deviation (MAD). Notably, SuperMetal consistently outperformed Metal3D by maintaining high precision while achieving significantly higher coverage, even at stricter thresholds. For instance, at a precision of $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim$$\end{document}$ 100%, SuperMetal achieved nearly double the coverage compared to Metal3D, demonstrating its ability to identify a larger set of correct metal positions without compromising precision. Additionally, SuperMetal showed improved spatial precision, as reflected by consistently lower MAD values across all probability cutoffs. Beyond its predictive precision, SuperMetal also excels in scalability, maintaining low inference times across varying protein sizes. Unlike Metal3D, which exhibits an exponential increase in runtime with increasing protein size, SuperMetal efficiently handles large proteins using a multi-scale approach that reduces computational complexity. Furthermore, SuperMetal predicts metal-binding locations without requiring prior knowledge of the number of metal ions, a limitation present in AlphaFold 3, which relies on this information for its predictions. This flexibility adds further utility to the SuperMetal framework, distinguishing it from existing methods.

These results underscore the robustness and versatility of SuperMetal, establishing it as a valuable tool for accurately predicting metal ion positions in protein structures. The ability to reliably pinpoint these positions carries significant implications for understanding metal-dependent biological processes, protein stability, and enzyme catalysis.

Despite the promising results achieved by SuperMetal, several limitations remain. First, our metal-binding site predictions currently rely on data from the ZincBind dataset, which focuses on zinc-binding proteins. Second, our approach does not explicitly incorporate additional structural elements such as RNA, small molecules, or water. Addressing these issues is critical for ensuring broader applicability across diverse biological scenarios. Future work could involve expanding the dataset to include a wider range of metal ions and structural properties. Additionally, linking holo (bound) complexes with their corresponding apo (unbound) and predicted structures may further strengthen the model’s generalizability and support more realistic inference scenarios. In principle, our framework can readily be adapted to complex biological contexts. For example, we recently extended our generative AI framework and data curation strategies to predict water-binding sites in proteins, as well as protein–ligand and protein–protein complexes, demonstrating state-of-the-art performance in those domains [47].

Lastly, although our training data included proteins up to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim 3000$$\end{document}$ residues—covering the majority of biologically relevant metalloproteins, given that those exceeding 1500 residues are relatively rare [48]—SuperMetal supports structures of any size, provided sufficient memory is available to construct the required all-atom and coarse-grained graphs. For exceptionally large proteins or resource-constrained environments, users can segment the protein into smaller sections, run predictions piecewise, and then merge the results.

Methods

Data set and preprocessing

For this study, we utilize the ZincBind database [49], a high-quality, non-redundant collection of zinc-binding sites extracted from the RCSB Protein Data Bank (PDB) [50]. The database comprises 19,154 unique sites across 19,103 PDB files. To eliminate redundancy, ZincBind clusters zinc binding sites based on the structural similarity of zinc-binding chains and by comparing the amino acid sequences and types within the binding site. Furthermore, the database includes only zinc sites satisfying physiologically relevant criteria (e.g., at least two liganding residues and three liganding atoms), and accounts for symmetrical units in protein structures, thereby preventing the misclassification of essential catalytic or structural zinc ions as merely surface zinc ions [51]. From the 19,154 unique sites, we extracted 10,253 PDB files, each containing one or more of these sites. Structures with more than 3000 residues were excluded from the dataset. Additionally, all PDB structures were cleaned by removing any exogenous ligands, retaining only the zinc ions. In cases where multiple models were available for a given structure, such as those with alternative residue conformations, only the first model was used. We then randomly selected 1000 structures for validation. Separately, our test set comprises 350 structures, including those from the original Metal3D test set and additional structures randomly sampled from our non-redundant dataset. To ensure a fair comparison, none of these 350 test structures contain binding sites similar to those in the training sets for both SuperMetal and Metal3D. Testing was conducted under identical computational conditions using one CPU and one GPU (Nvidia A100 40 GB).

Diffusion model

Each data point in the diffusion model represents the three-dimensional coordinates of a metal ion’s position within a protein structure [27, 52, 53]. The goal of the generative model is to approximate the probability distribution $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{x} \mid \textbf{y})$$\end{document}$ , where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}$$\end{document}$ refers to the metal positions and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{y}$$\end{document}$ denotes the protein structure [28]. Approximating $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{x} \mid \textbf{y})$$\end{document}$ presents two main challenges.

The first challenge is that directly computing the probability distribution is intractable, as it requires normalizing the distribution across the entire space of possible positions. To circumvent this, instead of estimating $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(\textbf{x} \mid \textbf{y})$$\end{document}$ directly, the diffusion model is utilized to estimate its gradient, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nabla _{\textbf{x}} \log p(\textbf{x} \mid \textbf{y})$$\end{document}$ , commonly known as the score function $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_\theta (\textbf{x})$$\end{document}$ [54, 55]. The score function $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_\theta (\textbf{x})$$\end{document}$ is a vector field that describes the direction in which the metal ions should move to reach their favorable positions from a given point in 3D space.

The second challenge arises from the lack of sufficient training data in certain regions of the protein. To address this, the true data distribution is ’evolved’ into a known distribution, typically a normal distribution [27, 56]. Through this process, the model diffuses data (i.e., metal ions) across the entire three-dimensional space conditioned on the protein structure, effectively filling in knowledge gaps and learning from a broader range of information across the protein structure. The entire architecture is depicted in the Supporting Information.

Forward diffusion and training

This evolution is facilitated by the forward step of the diffusion process, which is governed by a forward stochastic differential equation (SDE), described as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} d\textbf{x} = f(\textbf{x}, t) \, dt + g(t) \, d\textbf{w}, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}$$\end{document}$ represents the positions of all metal ions, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}$ denotes time, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{w}$$\end{document}$ refers to Gaussian noise or Brownian motion, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(t)$$\end{document}$ is the diffusion coefficient, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\textbf{x}, t)$$\end{document}$ is the drift coefficient. In our case, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\textbf{x}, t) = 0$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(t)$$\end{document}$ is defined as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{\frac{d\sigma ^2(t)}{dt}}$$\end{document}$ . The variance $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2(t)$$\end{document}$ evolves according to a hyperparameter of the model:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _t = \sigma _{\text {min}}^{(1 - t)} \cdot \sigma _{\text {max}}^{t}, \end{aligned}$$\end{document}

leading to the forward SDE:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} d\textbf{x} = \sqrt{\frac{d\sigma ^2(t)}{dt}} \, d\textbf{w}. \end{aligned}$$\end{document}

In other words, for a given protein-metal complex, as Gaussian noise $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{w}(t)$$\end{document}$ is added and given the original distribution of metal ion positions at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 0$$\end{document}$ , denoted as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}(0)$$\end{document}$ , the metal position at time $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}(t)$$\end{document}$ , can be numerically calculated through the equation above. Translation perturbation vectors, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta \textbf{r}$$\end{document}$ , also follow a Gaussian distribution with the mean $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu (t)$$\end{document}$ and variance $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2(t)$$\end{document}$ , which allows us to compute the gradient of the log probability of metal translations over the protein structure $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{y}$$\end{document}$ using the following equation:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nabla \log p_t(\Delta \textbf{r} \mid \textbf{y}) = -\frac{\Delta \textbf{r}(t) - \mu (t)}{\sigma ^2(t)} \end{aligned}$$\end{document}

Meanwhile, the score function $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_\theta (\textbf{x})$$\end{document}$ is generated by a neural network, where the inputs to the network are the locations of the metal ions, the protein structure, and the time $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}$ . The parameters of the neural network, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta$$\end{document}$ , are optimized by minimizing the loss function $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_\theta$$\end{document}$ :

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} L_\theta = \mathbb {E}_{p(\textbf{x})} \left[ \Vert \nabla \log p_t(\Delta \textbf{r} \mid \textbf{y}) - S_{\theta }(\textbf{x}, \textbf{y}, t)\Vert ^2_2\right] \end{aligned}$$\end{document}

The expectation value $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {E}_{p(\textbf{x})}$$\end{document}$ is calculated by averaging the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}$ -norm between the true vector field $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nabla \log p_t(\Delta \textbf{r} \mid \textbf{y})$$\end{document}$ and the predicted vector field $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{\theta }(\textbf{x}, \textbf{y}, t)$$\end{document}$ across the metal distribution in the training data. We trained the model for 400 epochs employing this loss function.

Reverse diffusion and sampling

In the forward diffusion process, given a protein structure, metal ions diffuse from their most favorable positions toward a Gaussian distribution in 3D space as noise is gradually added. During this phase, the score model learns the relationship between the favorable distribution at $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=0$$\end{document}$ and the perturbed distribution at time $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t$$\end{document}$ .

Now, by reversing time, we use the well-trained score function $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_\theta (\textbf{x})$$\end{document}$ to solve the reverse stochastic differential equation (SDE) and compute the favorable positions of metal ions from a random distribution:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} d\textbf{x} = \left[ f(\textbf{x}, t) - g(t)^2 S_\theta (\textbf{x}, \textbf{y}, t)\right] dt + g(t) d\textbf{w}, \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_\theta (\textbf{x}, t)$$\end{document}$ is the learned score function from the training phase, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\textbf{x}, t)$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$g(t)$$\end{document}$ represent the drift and diffusion terms from the forward SDE. For each protein structure, we perform inference by randomly initializing 100 candidate metal ion positions across the protein, ensuring that even sparsely represented or atypical coordination environments are sampled. These ions are guided by the score model $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_\theta (\textbf{x})$$\end{document}$ , ultimately reaching their most favorable positions within the protein structure (Fig. 6).Fig. 6. Fundamental Theory of the Score-based Generative Diffusion Model for Metal Ions in Proteins. The forward continuous-time stochastic differential equation (SDE) transitions the true locations of metal ions (top left) to random locations (top right). The score at each intermediate time step is predicted by a deep learning neural network, enabling the reverse process of the SDE. The grey part of the protein (top) represents the atomic structure of the protein surrounding the metal ions at their original positions

Confidence model for filtering

The confidence model in SuperMetal is built on an SE(3)-equivariant convolutional network [42], adapted from the diffusion model architecture. Its primary objective is to classify metal ion positions sampled from the score model as either ‘good’ or ‘bad’ quality for filtering purposes. A “good” position is defined by having a mean absolute distance (MAD) below a certain threshold.

To train the confidence model, multiple samples are generated for each training complex using the trained diffusion model, and the MAD for each sampled metal ion position is computed. Labels are assigned by comparing these MAD values to the defined threshold, which in our experiments is set to 5 Å. The model is trained using cross-entropy loss to classify positions accordingly. The final layer of the confidence model employs a fully connected layer applied to the mean-pooled scalar representations from the last convolutional layer, producing an SE(3)-invariant confidence score. This score predicts the likelihood that a candidate metal ion position is favorable, ensuring that low-confidence positions can be filtered out. Accurate prediction of these confidence scores is essential for the post-processing step, as it allows for the precise distinction and placement of metal ions within the protein structure. The detailed architecture is provided in the Supporting Information.

Clustering mechanism

The final step of the SuperMetal pipeline focuses on optimizing prediction precision through clustering of the predicted metal positions. After obtaining the sampled metal ion positions from the diffusion model, along with their confidence scores from the confidence model, a refinement process is applied. Positions with confidence scores below a defined threshold are filtered out, and the remaining high-confidence positions are clustered. A cluster is formed if it contains at least two metal positions within a 5 Å radius, using DBSCAN [57] algorithm implemented in scikit-learn [58]. For the DBSCAN algorithm, we set $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon$$\end{document}$ (the maximum distance between two points to be considered part of the same cluster) to 5 Å, consistent with the threshold for true positive metal predictions, and min_samples (the minimum number of points required to form a cluster) to 2, as sampled metal locations are relatively sparse. For each cluster, the final metal ion position is determined by averaging the coordinates of all metal ions within the cluster, resulting in one metal ion per cluster.

Evaluation and comparison

To evaluate the quality of predicted metal ion positions, we use precision, recall (coverage), and mean absolute deviation (MAD) as our primary metrics. These metrics provide a direct and intuitive measure of how accurately we identify metal-binding sites and how closely they match experimentally determined positions. In metal-binding site prediction, capturing all actual sites (i.e., maximizing recall) often takes precedence over penalizing false positives, which may reflect unobserved or transient sites.

Here, Precision is defined as the ratio of correctly predicted metal sites (true positives, TP) to the total number of predicted sites, including both correctly predicted sites and false positives (FP). Mathematically, precision is expressed as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {Precision} = \frac{\text {TP}}{\text {TP} + \text {FP}} \end{aligned}$$\end{document}

Recall, or coverage, measures the percentage of correctly predicted metal sites relative to the total number of true metal sites. This is represented as the ratio of true positives (TP) to the sum of true positives and false negatives (FN), where FN represents the true metal sites that were not detected:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {Coverage} = \frac{\text {TP}}{\text {TP} + \text {FN}} \end{aligned}$$\end{document}

A metal site is considered correctly predicted (TP) if the predicted position falls within 5 Å of the experimentally determined site. False positives (FP) are predicted sites that do not meet this criterion, while false negatives (FN) are true metal sites that the model fails to predict.

To quantify the positional precision of the predictions, we calculate the mean absolute deviation (MAD). We chose MAD over Mean Squared Error (MSE) because MAD is more straightforward to interpret in the context of spatial distances and provides a clear measure of prediction accuracy in terms of physical distance. MAD measures the average absolute difference between the correctly predicted metal positions, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\textbf{x}}_i$$\end{document}$ , and the corresponding experimental positions, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textbf{x}_i$$\end{document}$ , and is computed as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {MAD} = \frac{1}{n} \sum _{i=1}^{n} \Vert \textbf{x}_i - \hat{\textbf{x}}_i\Vert \end{aligned}$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n$$\end{document}$ is the number of predicted metal sites. Lower MAD values indicate a closer match to the true positions, providing a complementary measure to precision and recall in assessing the overall performance of the method. These metrics enable a comprehensive comparison between SuperMetal and other existing methods.

Supplementary Information

Supplementary material 1

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Zhang Y, Huang W, Wei Z, Yuan Y, Ding Z(2023) Equipocket: an e (3)-equivariant geometric graph neural network for ligand binding site prediction. ar Xiv preprint ar Xiv:2302.12177
2Song Y, Sohl-Dickstein J, Kingma Diederik P, Kumar A, Ermon S, Poole B(2020) Score-based generative modeling through stochastic differential equations. ar Xiv preprint ar Xiv:2011.13456
3Corso G, Stärk H, Jing B, Barzilay R, Jaakkola T (2022) Diffdock: diffusion steps, twists, and turns for molecular docking. ar Xiv preprint ar Xiv:2210.01776
4Ketata MA, Laue C, Mammadov R, Stärk H, Wu M, Corso G, Marquet C, Barzilay R, Jaakkola TS (2023) Diffdock-pp: rigid protein-protein docking with diffusion models. ar Xiv preprint ar Xiv:2304.03889
5Kreis K, Dockhorn T, Li Z, Zhong E(2022) Latent space diffusion models of cryo-em structures. ar Xiv preprint ar Xiv:2211.14169
6Metz L, Poole B, Pfau D, Sohl-Dickstein J (2016) Unrolled generative adversarial networks. ar Xiv preprint ar Xiv:1611.02163
7Geiger M, Smidt T (2022) e 3nn: Euclidean neural networks. ar Xiv preprint ar Xiv:2207.09453
8Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J et al (2024) Accurate structure prediction of biomolecular interactions with alphafold 3. Nature, pp 1–310.1038/s 41586-024-07487-w PMC 1116892438718835 · doi ↗ · pubmed ↗