PickET: An unsupervised method for localizing macromolecules in cryo-electron tomograms

Shruthi Viswanath; Shreyas Arvindekar; Omkar Golatkar

PMC · DOI:10.21203/rs.3.rs-7721609/v1·October 22, 2025

PickET: An unsupervised method for localizing macromolecules in cryo-electron tomograms

Shruthi Viswanath, Shreyas Arvindekar, Omkar Golatkar

PDF

Open Access

TL;DR

PickET is a new method that automatically finds macromolecules in cryo-electron tomograms without needing prior knowledge or manual input.

Contribution

PickET introduces an unsupervised approach for localizing macromolecules in cryo-ET data without requiring expert annotations or known structures.

Findings

01

PickET successfully localizes macromolecules of varying shapes, sizes, and abundance in diverse tomograms.

02

The method enables 3D classification and de novo structural characterization using predicted particle localizations.

03

PickET is efficient, scalable, and suitable for high-throughput analysis of cryo-ET datasets.

Abstract

Cryo-electron tomography (cryo-ET) datasets are rich sources of information capable of describing the localizations, structures, and interactions of macromolecules. However, most current methods for localizing particles in cryo-electron tomograms are limited to macromolecules with known structures, require extensive manual annotations, and/or are computationally expensive. Here, we present PickET, a method for localizing macromolecules in tomograms that does not rely on expert annotations and prior structures. Its performance is demonstrated on a diverse dataset comprising over a hundred tomograms from publicly available datasets, varying in sample types, sample preparation conditions, microscope hardware, and image processing workflows. We demonstrate that PickET can simultaneously localize macromolecules of various shapes, sizes, and abundance. The predicted particle localizations can…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species6

Homo sapiens(human · species)Saccharomyces cerevisiae(baker's yeast · species)Schizosaccharomyces pombe(fission yeast · species)Thermoanaerobacter kivui(species)Chlamydomonas reinhardtii(species)Bacteria Latreille et al. 1825(Bacteria stick insect · genus)

Chemicals4

ice DS MiLoPYP Gallium

Figures4

Click any figure to enlarge with its caption.

A](#F2), [Table 1](#T1), Fig. S1, Materials and Methods, Assessment). On the simulated tomogram dataset, all the PickET workflows performed better than the random baseline ([Fig. 2A](#F2), Fig. S1). Among the workflows, Gabor-GMM-CC (and Intensities-GMM-CC) performed the best with median F1-scores of 0.72 (and 0.71), respectively ([Fig. 2A](#F2), Fig. S1, Fig. S2). These workflows were also efficient, taking an average of ~ 12 minutes and ~ 26 minutes per tomogram, respectively (Fig. S3A). We show an example of semantic and instance segmentation using Gabor-GMM-CC on the simulated dataset ([Fi

A](#F4)). On the lysate and the plasma-FIB-milled *C. reinhardtii* datasets (CZI-DS-10440 and CZI-DS-10301, respectively), PickET performed significantly better than MiLoPYP in terms of the relative recall. On the remaining two gallium-FIB-milled *S. pombe* and *T. kivui* datasets (CZI-DS-10001 and CZI-DS-10008, respectively), MiLoPYP performed considerably better than the selected PickET workflow ([Fig. 4B](#F4)). Overall, MiLoPYP predicted significantly more particles than PickET, which resulted in a higher recall and lower precision in its predictions ([Fig. 4A](#F4)–[4C](#F4)). In terms of

Equations1

Funding2

—Department of Atomic Energy (DAE) TIFR
—Department of Biotechnology (DBT)

Keywords

Cryo-electron tomographyunsupervised learningparticle-pickinglocalizationprotein complexesmacromolecular assemblies

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Electron Microscopy Techniques and Applications · Electron and X-Ray Spectroscopy Techniques · ATP Synthase and ATPases Research

Full text

Introduction

Cryo-electron tomography (cryo-ET) is an imaging modality that enables the visualization of entire cells or lamellae milled from them at nanometer resolutions (Ng & Gan, 2020). In contrast with the traditionally used reductionist approach in structural biology, cryo-ET enables the structural characterization of macromolecules in their native cellular environment (Asano et al., 2016; Beck & Baumeister, 2016; Gubins et al., 2020; Lamm et al., 2022; Ng & Gan, 2020). The generated tomograms are rich in information as they provide a spatial description of the cellular proteome at nanometer and potentially sub-nanometer resolution (Mahamid et al., 2016; Xue et al., 2022). Comprehensively mapping macromolecules in tomograms to characterize their spatial distributions and interactions to build molecular atlases of cells is known as visual proteomics (Asano et al., 2016; Beck & Baumeister, 2016; Förster et al., 2010; Robinson et al., 2007; Turk & Baumeister, 2020). A major bottleneck in visual proteomics lies in localizing macromolecules in tomograms (Arvindekar et al., 2024; Majila et al., 2025). The missing wedge, the low signal-to-noise ratio in the data, the crowded cellular environment, and the considerable variation among particles in terms of size, shape, and abundance in tomograms make this task challenging (Y. Chen et al., 2014; Gubins et al., 2020; Lučić et al., 2013; Maurer et al., 2024b; Moebel et al., 2021; Pyle & Zanetti, 2021).

Template matching (TM) is a commonly used approach for localizing macromolecules with known structures in tomograms (Böhm et al., 2000; Frangakis et al., 2002; Maurer et al., 2024b; Pyle & Zanetti, 2021). In TM, a low-pass filtered reference template of a macromolecule is used to obtain the positions and orientations of the target macromolecule in the tomogram (Böhm et al., 2000; Frangakis et al., 2002; Maurer et al., 2024b). Several software packages have been developed for template matching, including recent ones that show significant improvements in accuracy and efficiency (Castaño-Díez et al., 2012, 2017; Cruz-León et al., 2024; Hrabe et al., 2012; Lucas et al., 2021; Maurer et al., 2024a, 2024b; Pyle & Zanetti, 2021; Wan et al., 2024). However, TM is limited to macromolecules with known experimental structures or confident computational structure predictions (e.g., from AlphaFold (Abramson et al., 2024; Jumper et al., 2021)). TM does not account for conformational and compositional heterogeneity, and often fails to distinguish between structurally similar particles (Jin et al., 2024; Kim et al., 2023; Maurer et al., 2024b). In addition, localizing small particles using TM, especially in crowded environments, is still challenging (Cruz-León et al., 2024; Martinez-Sanchez, 2025). Moreover, the high false positive rate often requires complex and time-consuming post-processing (Cruz-León et al., 2024). Lastly, TM is computationally expensive and has low throughput, limiting its usability for visual proteomics (Martinez-Sanchez, 2025).

To overcome the limitations of TM, several deep learning-based approaches have been developed (M. Chen et al., 2017; de Teresa-Trueba et al., 2023; Huang et al., 2024; Liu et al., 2024; Moebel et al., 2021; Rice et al., 2023; Uddin et al., 2024; Zeng et al., 2018, 2023). Supervised learning-based approaches have been shown to outperform TM (M. Chen et al., 2017; de Teresa-Trueba et al., 2023; Gubins et al., 2020; Lamm et al., 2022, 2024; Last et al., 2024; Moebel et al., 2021; Wagner et al., 2019). However, these methods often do not generalize across domains (e.g., different specimen types and microscope hardware). Usually, they can localize only the macromolecules used in their training, and often fail to localize less abundant particles (Cruz-León et al., 2024). Moreover, they require a considerable amount of reliable training data, which is time-consuming, labor- and compute-intensive to obtain, since it relies on template matching and/or manual annotation (Martinez-Sanchez, 2025).

To extend the usability of deep learning-based methods by reducing the dependence on extensive annotations, several supervised and self-supervised representation learning-based methods have been developed (Huang et al., 2024; Rice et al., 2023; Zeng et al., 2018, 2023). In these methods, features describing subvolumes from the input tomogram are learnt by a neural network. These features are then used to cluster subvolumes containing structurally similar particles. Further, fully unsupervised methods have also been developed (Jin et al., 2024; Martinez-Sanchez et al., 2020; Uddin et al., 2025). In (Martinez-Sanchez et al., 2020; Uddin et al., 2025), the features for clustering subvolumes are not learnt, but instead derived from discrete Morse theory and pre-trained models from computer vision, respectively. Although unsupervised methods hold great promise for enabling high-throughput particle localization in tomograms, their routine use in particle localization is still limited, as they rely on substantial manual input and require improvements in both efficiency and accuracy (Cruz-León et al., 2024; Huang et al., 2024; Martinez-Sanchez, 2025; Wagner & Raunser, 2025).

Here, we present PickET, a library of workflows for high-throughput unsupervised localization of macromolecules in cryo-electron tomograms that does not rely on extensive training data or prior structures of macromolecules. It is efficient, scalable to large datasets, and requires minimal user input. We demonstrate the performance of PickET on 133 tomograms from five publicly available datasets that vary in the biological specimens studied, sample types imaged, sample thinning process used, imaging hardware used, image processing software employed for generating the reconstructions, and the particle types annotated (de Teresa-Trueba et al., 2023; Dietrich et al., 2022; Ermel et al., 2024; Harrington et al., 2024; Khavnekar et al., 2023; Peck et al., 2024; Rice et al., 2023). PickET can simultaneously localize macromolecules of various shapes, sizes, and abundance. It is comparable to another self-supervised particle localization method, MiLoPYP (Huang et al., 2024), outperforming the latter on a recent dataset generated using the newer plasma-FIB milling technology. PickET predictions can be used for 3D classification using methods such as (Scheres, 2012) and de novo structural characterization, as well as to supplement template matching and manual annotations for developing advanced particle localization methods. Efficient and scalable unsupervised methods for localizing macromolecules in tomograms, such as PickET, can contribute to high-throughput visual proteomics studies.

Results

Overview of PickET

PickET is a modular Python library for unsupervised localization of particles in cryo-electron tomograms. A typical PickET run on an input tomogram involves two steps (Fig. 1, Materials and Methods). The first step generates a semantic segmentation that separates the voxels associated with particles from the ones associated with the background in the input tomogram. Features describing each voxel are extracted from a 3D sub-volume or neighborhood around the voxel. These features are then used to cluster the voxels into two groups—particles and background—resulting in a binary semantic segmentation. In the second step, the binary semantic segmentation is converted to an instance segmentation, distinguishing individual particle instances. The geometric centroids of the particle instances in the instance segmentations are the predicted particle localizations. Owing to the modular architecture of the PickET library, each of the three feature extraction modes – Intensities, Fast Fourier Transform (FFT), and Gabor – can be paired with either of the clustering algorithms – K-Means and Gaussian Mixture Models (GMM) – providing six semantic segmentations for the input tomogram in the first step. The six semantic segmentations can be processed with either of the two particle extraction algorithms – connected component labeling (CC) and watershed segmentation (WS) – in the second step, providing twelve sets of particle localizations corresponding to the twelve workflows in PickET (See Materials and Methods).

PickET localizes macromolecules in tomograms

First, we compared the PickET workflows on a dataset comprising 88 simulated tomograms in terms of their precision, recall, and F1-score (Fig. 2A, Table 1, Fig. S1, Materials and Methods, Assessment). On the simulated tomogram dataset, all the PickET workflows performed better than the random baseline (Fig. 2A, Fig. S1). Among the workflows, Gabor-GMM-CC (and Intensities-GMM-CC) performed the best with median F1-scores of 0.72 (and 0.71), respectively (Fig. 2A, Fig. S1, Fig. S2). These workflows were also efficient, taking an average of ~ 12 minutes and ~ 26 minutes per tomogram, respectively (Fig. S3A). We show an example of semantic and instance segmentation using Gabor-GMM-CC on the simulated dataset (Fig. 2B, 2C).

Next, we compared the PickET workflows on a dataset comprising 45 real-world tomograms (Table 1, Materials and Methods, Datasets). For real-world datasets, the ground truth annotations were available only for a few particle types. Whereas, PickET is not limited to specific particle types. With incomplete annotations, the precision may be underestimated, leading to a misleading F1-score. As an alternative, we used relative recall that balances the recall on the PickET predictions with that of random guessing for the same number of predictions, without explicitly relying on precision (Materials and Methods, Assessment). Among the workflows, Gabor-KMeans-WS (and FFT-KMeans-WS) performed the best, with median relative recalls of about 0.82 (0.82), 0.57 (0.58), 0.47 (0.42), and 0.39 (0.36) for the lysate (CZI-DS-10440), plasma-FIB milled C. reinhardtii (CZI-DS-10301), Gallium-FIB-milled S. pombe (CZI-DS-10001), and Gallium-FIB-milled T. kivui (CZI-DS-10008) datasets, respectively (Fig. 2D, Fig. S4, Fig. S5, Table 1). Among the best-performing workflows, Gabor-KMeans-WS was the most efficient, requiring, on average, approximately 15 minutes per tomogram on the CZI-DS-10301 dataset (Fig. S3B). We show example semantic and instance segmentations using Gabor-KMeans-WS on real-world datasets (Fig. 2E, 2F, Fig. 3).

PickET can localize large, abundant particles with known structures, such as ribosomes, that can also be annotated by template matching (Fig. 3, Fig. S6, Table S1). In addition, it can also localize small, less abundant particles (Fig. S6, Table S1). Further, the instance segmentations suggest that PickET also localizes particles that were not annotated in the ground truth, indicating that it can be used to supplement the annotations (Fig. 2C, 2F, Fig. 3). In summary, PickET can simultaneously and efficiently localize macromolecules of various sizes, shapes, and abundance (Fig. 3, Fig. S6, Table S1).

Comparison to another particle localization method

Next, we compared the best-performing PickET workflows – Gabor-GMM-CC for simulated tomograms and Gabor-KMeans-WS for the real-world tomograms – with MiLoPYP, another particle localization method (Fig. 4). For this comparison, only the first step (Cellular Content Exploration) of MiLoPYP was used (Materials and Methods, Comparison with existing methods). The two methods were compared on all 133 tomograms used in this study based on the precision, recall, and F1-score on the simulated dataset and the relative recall on the real-world datasets (Table 1, Fig. 4). In addition, the two methods were compared based on the number of particles predicted and the total time taken (Fig. 4).

On the simulated dataset, PickET had a higher precision, lower recall, and higher F1 score compared to MiLoPYP (Fig. 4A). On the lysate and the plasma-FIB-milled C. reinhardtii datasets (CZI-DS-10440 and CZI-DS-10301, respectively), PickET performed significantly better than MiLoPYP in terms of the relative recall. On the remaining two gallium-FIB-milled S. pombe and T. kivui datasets (CZI-DS-10001 and CZI-DS-10008, respectively), MiLoPYP performed considerably better than the selected PickET workflow (Fig. 4B). Overall, MiLoPYP predicted significantly more particles than PickET, which resulted in a higher recall and lower precision in its predictions (Fig. 4A–4C). In terms of efficiency, the full PickET library as well as the single selected workflow take more time than MiLoPYP (Fig. 4D). This may be attributed to the tomogram-specific design of PickET, which processes a single tomogram at a time. The selected PickET workflow (Gabor-KMeans-WS) took ~ 1080 minutes versus ~ 244 minutes for MiLoPYP on 88 simulated tomograms on a workstation (Fig. 4D). However, the efficiency of PickET can be easily improved in at least two ways. One, by trivially parallelizing runs across tomograms and across workflows. Two, by using a small subset of tomograms to estimate the clustering parameters before applying them to the entire dataset. In summary, PickET performs comparably to MiLoPYP, with MiLoPYP performing better on tomograms from gallium FIB-milled samples. In contrast, PickET performs better on lysates and, importantly, it significantly outperforms MiLoPYP on the tomograms from the newer plasma FIB-milled workflows.

Influence of particle characteristics on localization performance

First, we examined how the performance of PickET varies with particle size. About 77% (83 out of 108) of the macromolecules in the simulated dataset, ranging in size from 60 kDa to 2.8 MDa, were localized by PickET with an average particle-wise recall of at least 0.7. Out of 62 macromolecules with molecular weight greater than 500 kDa, 50 (~ 81%) were localized by PickET with an average particle-wise recall of at least 0.7 (Fig. S6A), showing that PickET is reasonably accurate at localizing medium to large macromolecules. Moreover, out of 46 macromolecules smaller than 500 kDa, 33 (~ 72%) were localized by PickET with an average particle-wise recall of at least 0.7, demonstrating that PickET can also localize small macromolecules reasonably well. A similar trend was seen for the dependence of particle-wise recall on radius of gyrations (Fig. S6B). Similarly, we also demonstrated the particle-wise recall for PickET in the real-world datasets (Table S1). In summary, PickET can localize macromolecules of various sizes and shapes.

Second, we sought to identify which particle shapes are easier or more challenging to pick. Interestingly, both PickET and MiLoPYP showed poor recall on the same particle types. For both methods, particles that were small, hollow and/or had an elongated (4wrm, 7s7k, 4uic), or donut-like (6up6) shape were challenging to localize; whereas, large, compact particles (5g04, 6id1, 6w6m, 6jy0) were localized well by both methods (Fig. 4E).

Discussion

Here, we developed PickET, a library of workflows for fully unsupervised localization of macromolecules in cryo-electron tomograms. It can be used in conjunction with 3D classification and sub-tomogram averaging for the de novo structural characterization of macromolecules. Given the efficiency and precision of PickET (Fig. 2A, Fig. S3), it can be used to supplement template matching and manual annotations for training deep learning-based particle localization and identification methods. In addition, PickET-generated semantic segmentations can be used in area-selective template matching approaches (Last et al., 2024) to improve the localization of target macromolecules in tomograms.

The tomograms generated from different sample types, sample thinning methods, and microscope hardware often vary in their signal-to-noise ratio, making it challenging to develop a particle-picking method that generalizes well across datasets. PickET bypasses this issue with its tomogram-specific design. We demonstrated the generalizability of PickET on 133 tomograms that vary in terms of particle types annotated, biological specimens studied, sample types being imaged, sample thinning process, imaging hardware used, and image processing software employed for generating the reconstructions (Table 1). PickET can simultaneously localize macromolecules of various shapes and sizes, ranging from 60 kDa to 2.8 MDa; even localizing several macromolecules smaller than 500 kDa (Fig. S6, Table S1). Importantly, PickET can localize macromolecules irrespective of their abundance (Fig. 3, Table S1). Moreover, PickET does not require known structures. Additionally, it is efficient, scalable to large datasets, and can be applied in a high-throughput manner. Therefore, owing to its generalizability, its non-reliance on previously characterized structures, and its ability to localize macromolecules of various shapes, sizes, and abundance, PickET can serve as a valuable tool for visual proteomics studies.

The performance on the tomograms from real-world datasets provides a realistic view of PickET’s applicability. The real-world tomograms are more complex compared to the simulated dataset, which contained a sparse distribution of particles and virtually no contaminants, such as ice crystals. On the contrary, the real-world tomograms illustrated the common challenges with tomograms, such as the crowded intracellular landscape, and artifacts from sample preparation, such as uneven lamellae and the presence of contaminants. The comparatively worse performance of the workflows from the PickET library on some of the real-world tomograms may be attributed to these challenges.

For crowded tomograms used in this study (such as CZI-DS-10001, CZI-DS-10008, and CZI-DS-10301), the Gabor-KMeans-WS workflow performed the best, whereas for comparatively sparse tomograms (such as from the simulated dataset and CZI-DS-10440), Gabor-GMM-CC worked the best (Fig. S1-S5). Across all datasets, FFTs-GMM-CC performed the worst, predicting considerably more particles than other workflows (Fig. S1-S5). This could be attributed to the dusty semantic segmentations that it generates, wherein small groups of background voxels are falsely segmented as particles. For generating semantic segmentations on crowded tomograms, K-Means clustering performed better than GMM, as the latter often resulted in larger particle segmentations that merged segmentations of neighboring particles. Similarly, for generating instance segmentations on crowded tomograms, watershed segmentation-based workflows were better suited than connected component labeling, as the latter often failed to separate neighboring particles.

In general, PickET generates good semantic segmentations that effectively separate the particle- and background-associated voxels (Fig. 2, Fig. 3). However, the methods used for generating instance segmentations often perform poorly in crowded environments. Additionally, the centroid-based particle localizations used in PickET may not be appropriate for localizing elongated, branched, or lobed particles (Fig. 4). This highlights the need for better instance segmentation and particle annotation methods.

An assumption in the unsupervised semantic segmentation step of PickET is that there is a background-particle separation. However, the clustering algorithm may instead separate fiducials, contaminants, or membranes from the rest of the tomogram. Moreover, artifacts such as unevenness in the lamellae and contaminants such as ice crystals make it challenging to distinguish particles from the background. PickET includes the option to crop the tomogram along the Z-axis before semantic segmentation, to reduce the effect of artifacts on the clustering, as well as before instance segmentation, to limit the localization to the lamella. In the future, additional preprocessing steps may be added to PickET to remove the voxels corresponding to such damages before generating the semantic segmentations. Finally, the voxel clustering-based approach used in PickET is memory-intensive. To address this, we provide options such as setting the Z-slice bounds and subsampling voxels before generating semantic segmentations in PickET to make it adaptable to the available memory.

Recent advancements in hardware and software for imaging cells using cryo-ET have led to an increase in the quality and volume of tomography data (Martinez-Sanchez, 2025). High-throughput particle localization methods that do not rely on ground truth annotations, such as PickET, hold great promise for advancing visual proteomics. The datasets made available by the Cryo-ET Data Portal (https://cryoetdataportal.czscience.com/) (Ermel et al., 2024) will further accelerate the development and evaluation of such methods.

Materials and Methods

Datasets

To evaluate the performance of PickET, we compiled a dataset by combining the tomograms and particle annotations from several publicly available datasets. Overall, the compiled dataset comprised 133 tomograms, including simulated and real-world cases (Table 1). The annotations on these tomograms encompass over a hundred different types of particles, ranging in size from about 30 kDa to over 3 MDa. The real-world tomograms in the dataset were carefully chosen to ensure diversity in terms of particle types annotated, biological specimens studied (bacteria, yeasts, mammalian cells), sample types being imaged (lysates, whole cells, and cellular lamellae), specimen processing (milling process used for obtaining the lamellae – e.g., gallium or plasma FIB), imaging hardware used, and image processing software employed for generating the reconstructions.

PickET workflow

PickET is a modular library for unsupervised localization of particles in a cryo-electron tomogram, comprising three feature extraction modes, two clustering algorithms, and two particle extraction algorithms. We define a PickET workflow as a specific selection of a feature extraction mode, a clustering algorithm, and a particle extraction algorithm. Any feature extraction mode can be used along with either of the clustering algorithms and either of the particle extraction algorithms, providing a total of twelve workflows. Each such workflow is split into two steps, with the feature extraction and clustering modules comprising the first (generating semantic segmentation) step, and the particle extraction module comprising the second (localizing particles) step (Fig. 1).

Generating semantic segmentation

The first step in all the workflows in the PickET library is to generate binary semantic segmentations for the input tomogram that classify each voxel as belonging to a particle or background.

Inputs

The input tomogram is first denoised using TomoEED with default settings (https://sites.google.com/site/3demimageprocessing/tomoeed, (Moreno et al., 2018)). Often, tomograms feature a central slab along the Z-axis, where the tomogram is the most informative, i.e., it is most likely to contain particles. We provide an option to specify the bounds on this central Z-slab to generate semantic segmentations. Using these bounds reduces the memory requirement for larger tomograms and may also help avoid the contaminants from the periphery of the lamella from confounding the clustering algorithm. Additional options are provided for reducing the memory requirements for larger tomograms (See https://github.com/isblab/pickET).

The semantic segmentation generation step can be further divided into three sections: preprocessing, feature extraction, and clustering.

Preprocessing

The input tomogram is first blurred using a Gaussian kernel with a standard deviation $[eqn]$ to enhance contrast. Then, the voxel intensities of the blurred tomogram are normalized to be between zero and one.

Feature extraction

Owing to the low signal-to-noise ratio in cryo-electron tomograms, classifying the voxels based solely on their intensity values often results in segmentations that fail to distinguish particles from the background. We found that a sub-volume around each voxel, termed the neighborhood of a voxel, provides more informative context for characterizing a voxel. We use $[eqn]$ voxel neighborhoods around all non-peripheral voxels to compute their features; a non-peripheral voxel is at least two voxels ( $[eqn]$ ) away from the edge of the tomogram. Out of the three feature extraction modes provided in the PickET library – intensities, FFT, and Gabor – the last two are GPU-accelerated.

Intensities mode

In the intensities mode, the intensities corresponding to all 125 voxels in the 5 × 5 × 5 neighborhood of a voxel are used as its features.

FFT mode

In the FFT mode, the magnitudes of the 3D discrete Fourier transform of each voxel neighborhood are used as the features of a voxel.

Gabor mode

In the Gabor mode, first, a Gabor filter bank containing $[eqn]$ Gabor filters is generated. A Gabor filter is a 3D matrix $[eqn]$ , where $[eqn]$ is the size of the voxel neighborhood. It is generated as the product of a 3D Gaussian distribution and a sinusoid. An element $[eqn]$ in such a matrix is computed as

[eqn]

Where, $[eqn]$ , and $[eqn]$ represent indices in the filter along the $[eqn]$ , and $[eqn]$ axes, $[eqn]$ is a Gaussian distribution with mean $[eqn]$ and standard deviation $[eqn]$ , and $[eqn]$ , $[eqn]$ , and $[eqn]$ are the frequencies of the sinusoid along the respective axes. In the Gabor filters used in PickET, μ, the mean of the Gaussian is placed at the centre of the filter, i.e., at $[eqn]$ , and $[eqn]$ , the standard deviation, is set to be $[eqn]$ . The phase $[eqn]$ is set to 0. For this study, ten sinusoidal frequencies were used per axis, spaced uniformly between 0 and the Nyquist frequency (0.5 cycles/voxel), resulting in a filter bank comprising $[eqn]$ filters. Convolving the filter bank with voxel neighborhoods yields $[eqn]$ responses per voxel. We then select a subset, $[eqn]$ filters with the highest response standard deviations across all voxels to define the Gabor features for a voxel.

Clustering

Based on the features, the voxels are clustered into two clusters using either of the two clustering algorithms – K-Means or GMM. Usually, the two clusters correspond to background and particle classes. By default, the smaller (larger) cluster is considered to represent the voxels associated with particles (background). However, given that this assumption may not hold for a small subset of segmentations, PickET includes an option to invert the generated cluster assignments.

Clustering may be performed in two steps if the tomogram is too large to be processed in one step or if central Z-slab bounds are provided (see Inputs). In such cases, first, the cluster parameters are estimated based on the central Z-slab. Then, these parameters are used to assign clusters to all voxels, resulting in a binary semantic segmentation, with particle (background) voxels marked with 1 (0).

Localizing particles

The second step in all the workflows in the PickET library is to localize particles in the tomogram, given a binary semantic segmentation as input. Based on the input semantic segmentation, instance segmentations that separate individual particle instances are generated using one of two particle extraction algorithms – connected component labeling (CC) and watershed segmentation (WS). Geometric centroids of individual particle instances are then computed as the predicted particle localizations.

As a result, a run of the complete PickET library generates six semantic segmentations in the first step, and twelve instance segmentations with corresponding particle localizations in the second step for each input tomogram. An optimal instance segmentation is one in which, first, the particles are well separated from the background and, second, the individual particle instances are well separated from each other. The corresponding particle localizations may be used for downstream tasks such as sub-tomogram averaging.

Assessment

Precision, recall, and F1-score

PickET workflows were assessed using precision, recall, and F1-score on the simulated dataset. For the simulated dataset, a predicted particle localization was considered a true positive if the centroid of the predicted particle was within 100 Å of a ground truth particle centroid. Similarly, this distance threshold was set to 125 Å for predictions on the real-world datasets to account for the error in the particle localizations in the ground truth information.

Relative recall

PickET workflows are not limited to specific particle types. However, the ground truth annotations in real-world datasets were available only for a few particle types. This implied that the ground truth annotations for assessing the performance of methods like PickET on real-world datasets were incomplete. In cases with incomplete ground truth, the absence of a ground truth annotation in the vicinity of a prediction suggests one of three possibilities: ground truth annotations were not generated on the particle type captured by PickET, ground truth annotations were generated for this particle type, but this instance was missed by the annotation method, or the PickET prediction is a false positive. In such cases, the precision may be underestimated, leading to a potentially misleading F1-score.

The relative recall is an alternative metric to assess the performance of PickET workflows on real-world datasets with incomplete ground truth annotations, without relying on precision. Relative recall balances the recall on the PickET predictions with the recall for the same number of predictions made by random guessing.

[eqn]

Where $[eqn]$ is the relative recall, $[eqn]$ is the recall for PickET predictions, and $[eqn]$ is the recall for the same number of particle localizations predicted at random. Relative recall ranges from 0 to 1, with values close to 1 indicating that PickET performs better than random guessing, and values near 0 indicating no improvement over random.

A comparison between the relative recall and F1-score on a simulated setup revealed that for a constant precision, the F1-score increased with an increase in the number of predictions. Whereas, relative recall decreased with an increase in the number of predictions, penalizing the model for overpredicting relative to the ground truth annotations (Fig. S7).

Comparison with existing methods

The performance of PickET was compared with that of MiLoPYP (dataset-specific cellular pattern Mining and particle Localization PYthon Pipeline), a self-supervised learning-based method on 125 tomograms (Table 1) (Huang et al., 2024). MiLoPYP is a two-step method. The first step, cellular content exploration, uses self-supervised contrastive learning for localizing particles. The second step, protein-specific particle localization, uses few-shot learning to localize particles of a specific type (for instance, ribosomes), based on a subset of user-picked particles from the first step. Since the second step is particle-specific, we compared our method using only the output from the first step of MiLoPYP. The best-performing PickET workflows, as identified earlier (Fig. 2, Fig. 3, Fig. S1-S5), were compared to MiLoPYP based on the precision, recall, and F1-score for the simulated tomograms and relative recall for the real-world tomograms (Fig. 4).

Supplementary Material

Supplementary Files

This is a list of supplementary files associated with this preprint. Click to download.

• manuscriptv5zoteroexportsuppmat.pdf

Bibliography54

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung C-C, O’Neill M, Reiman D, Tunyasuvunakool K, Wu Z, ŽemgulytėA, Arvaniti E, Jumper JM (2024) Accurate structure prediction of biomolecular interactions with Alpha Fold 3. Nature 630(8016):493–500. 10.1038/s 41586-024-07487-w 38718835 PMC 11168924 · doi ↗ · pubmed ↗
2Arvindekar S, Majila K, Viswanath S (2024) Recent methods from statistical inference and machine learning to improve integrative modeling of macromolecular assemblies (Version 4). ar Xiv. 10.48550/ARXIV.2401.17894 · doi ↗
3Asano S, Engel BD, Baumeister W (2016) In Situ Cryo-Electron Tomography: A Post-Reductionist Approach to Structural Biology. Journal of Molecular Biology, 428(2, Part A), 332–343. 10.1016/j.jmb.2015.09.03026456135 · doi ↗ · pubmed ↗
4Beck M, Baumeister W (2016) Cryo-Electron Tomography: Can it Reveal the Molecular Sociology of Cells in Atomic Detail? Trends Cell Biol 26(11):825–837. 10.1016/j.tcb.2016.08.00627671779 · doi ↗ · pubmed ↗
5Böhm J, Frangakis AS, Hegerl R, Nickell S, Typke D, Baumeister W (2000) Toward detecting and identifying macromolecules in a cellular context: Template matching applied to electron tomograms. Proceedings of the National Academy of Sciences, 97(26), 14245–14250. 10.1073/pnas.230282097 · doi ↗
6Castaño-Díez D, Kudryashev M, Arheit M, Stahlberg H (2012) Dynamo: A flexible, user-friendly development tool for subtomogram averaging of cryo-EM data in high-performance computing environments. Journal of Structural Biology, 178(2), 139–151. 10.1016/j.jsb.2011.12.01722245546 · doi ↗ · pubmed ↗
7Castaño-Díez D, Kudryashev M, Stahlberg H (2017) Dynamo Catalogue: Geometrical tools and data management for particle picking in subtomogram averaging of cryo-electron tomograms. J Struct Biol 197(2):135–144. 10.1016/j.jsb.2016.06.00527288866 · doi ↗ · pubmed ↗
8Chen M, Dai W, Sun SY, Jonasch D, He CY, Schmid MF, Chiu W, Ludtke SJ (2017) Convolutional neural networks for automated annotation of cellular cryo-electron tomograms. Nat Methods 14(10):983–985. 10.1038/nmeth.440528846087 PMC 5623144 · doi ↗ · pubmed ↗