Addressing persistent challenges in digital image analysis of cancer tissue: resources developed from a hackathon
Sandhya Prabhakaran, Clarence Yapp, Gregory J. Baker, Johanna Beyer, Young Hwan Chang, Allison L. Creason, Robert Krueger, Jeremy Muhlich, Nathan Heath Patterson, Kevin Sidak, Damir Sudar, Adam J. Taylor, Luke Ternes, Jakob Troidl, Xie Yubin, Artem Sokolov, Darren R. Tyson

TL;DR
Researchers collaborated in a virtual event to develop tools for analyzing complex cancer tissue images and shared the resulting resources.
Contribution
A collaborative hackathon produced publicly available resources to address challenges in digital cancer image analysis.
Findings
Participants developed solutions for cell type classification and spatial data visualization.
Resources were created to handle artifacts and improve image representation learning.
The hackathon highlighted remaining challenges in scaling analysis for large datasets.
Abstract
The National Cancer Institute (NCI) supports numerous research consortia that rely on imaging technologies to study cancerous tissues. To foster collaboration and innovation in this field, the Image Analysis Working Group (IAWG) was created in 2019. As multiplexed imaging techniques grow in scale and complexity, more advanced computational methods are required beyond traditional approaches like segmentation and pixel intensity quantification. In 2022, the IAWG held a virtual hackathon focused on addressing challenges in analyzing complex, high‐dimensional datasets from fixed cancer tissues. The hackathon addressed key challenges in three areas: (1) cell type classification and assessment, (2) spatial data visualization and translation, and (3) scaling image analysis for large, multi‐terabyte datasets. Participants explored the limitations of current automated analysis tools, developed…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1
Fig. 2
Fig. 3
Fig. 4- —National Cancer Institute 10.13039/100000054
- —National Science Foundation 10.13039/100000001
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical and Engineering Education · Cell Image Analysis Techniques · Single-cell and spatial transcriptomics
Introduction
1
The cellular histopathology of cancer has been studied for hundreds of years with the first description of a cellular origin of cancer being described by Rudolph Virchow in the 1850s [1]; histopathologic evaluation of tumor samples is now an essential component of standard clinical practice. The analysis of cancerous tissue samples has long been performed with direct visual inspection by trained pathologists [2], but, with the technological advancement of instruments to spatially resolve tissue samples and the concomitant increase in computational power and function, a new field of computational image analysis of cancerous tissues has evolved. With this explosion in our ability to generate digital images of cancer has come the challenge of how to quantitatively extract meaningful features from them in an automated way.
The National Cancer Institute (NCI) broadly supports research programs that foster emerging areas in cancer biology and the development of new experimental models for cancer research, including: the Cancer Systems Biology Consortium (CSBC), Physical Sciences–Oncology Network, the Human Tumor Atlas Network (HTAN), and the more recent additions of the Cellular Cancer Biology Imaging Research, Acquired Resistance to Therapy Network, Patient‐Derived Xenograft Network, and Target Enablement to Accelerate Therapy Development for Alzheimer's Disease. Imaging is a core technology among many research groups within these programs. In addition, non‐cancer‐specific initiatives such as Human BioMolecular Atlas Project (HuBMAP) [3, 4] and the Human Cell Atlas [5, 6, 7] rely on digital imaging as a major component of their programs and are generating enormous amounts of spatially resolved data with up to hundreds or thousands of measurements mappable to individual cells, at an ever‐increasing scale. Standard image processing techniques of image alignment and stitching, segmentation and cell type calling have facilitated automated extraction of quantitative features from digital images [8, 9, 10, 11], and their further refinement will continue extending their utility and capabilities. However, the next generation of imaging data comprises many more measurements per cell and covers an ever‐expanding volume of tissue per sample (i.e., big data).
The interpretation of imaging data may therefore be limited when analyzed using the traditional image processing pipeline that does not utilize the richer features of modern imaging data (e.g., larger area, higher resolution, many channels, multimodal measurements, etc.) [12].
In response to the changing landscape of digital image analysis across NIH‐funded programs, a trans‐consortia Image Analysis Working Group (IAWG) was initiated in 2019 with the objectives of disseminating the work of the multitude of imaging‐related endeavors across all of the research groups and developing collaborations to address common challenges. In January 2020, the IAWG held a workshop designed to identify challenges that could be addressed within an in‐person hackathon setting, which was hosted by Vanderbilt University in early March 2020, a few days prior to the nationwide shutdown in response to COVID‐19. Within the proceedings of the IAWG's combined workshop/hackathon we described several specific challenges centered around the traditional image processing pipeline and how they were addressed by the hackathon [13], and the IAWG has continued to identify and address outstanding challenges in the digital image analysis pipeline.
Here we discuss some of the plethora of important open questions in the analysis of multiplexed tissue images, the first steps toward a deeper understanding of these questions, what possible solutions may look like, what approaches were addressed during the Image Analysis Hackathon 2022, and where further work remains. The hackathon considered 11 challenges that fell into three main themes: (1) challenges to cell type classification and assessment, (2) translation and visual representation of spatial aspects of high dimensional data, and (3) scaling digital image analyses to large (multi‐terabyte) datasets (S1). A brief description of the execution of the hackathon is provided in S6 and all participants and their contributions to the hackathon are provided in S7. Here, we discuss our efforts toward addressing specific aspects of the main themes in the hackathon and attempt to provide a broader view of the remaining challenges to a more robust automated pipeline for large‐scale multiplexed digital image analysis.
Challenges to identifying and classifying cell types
2
Light microscopy‐based digital histopathological analysis of cancer tissues has been greatly enhanced by several recently‐developed multiplexed techniques, including: antibody‐based methods such as cyclic immunofluorescence (CyCIF) [14], iterative indirect immunofluorescence imaging [15], imaging mass spectrometry, imaging mass cytometry [16], multiplexed immunofluorescence, and co‐detection by indexing [17]; and spatial transcriptomic techniques such as multiplexed error‐robust fluorescence in situ hybridization (MERFISH) [18] and sequential fluorescence in situ hybridization [19]. These techniques increase the number of features captured per cell and have greatly expanded our understanding of the high level of variation that exists among cells, including those with very similar morphological characteristics. However, accurate quantification of these features and their association with individual cells within the spatial context of cancerous tissue largely relies on image segmentation and other techniques used to demarcate cellular boundaries. While deep learning approaches have been successfully deployed to automate this task with previously unattainable levels of accuracy (e.g., CellPose [9], Mesmer [20], Hover‐Net [21]), numerous challenges persist that limit our ability to automatically achieve accurate quantification of features at single‐cell and sub‐cellular resolution. Such limitations include (i) tissue handling artifacts (e.g., bubbles, dust and debris, crush and sectioning artifacts, etc.); (ii) technical artifacts (e.g., inaccurate image stitching, uneven illumination, and antibody aggregates) (Fig. 1A); and (iii) image projection artifacts that arise while projecting imprecise and overlapping cell boundaries from three‐dimensional structures to two dimensions resulting in apparent “lateral spillover [22].” All of these can contribute to errors in the output of segmentation, gating, and cell type calling algorithms that generally expect ideal data. Moreover, these high‐dimensional feature sets must be analyzed and visualized to compare the differences and similarities among cell types, yet no well‐accepted standardized method for these comparisons (or how to define each cell type) has been described. Within the hackathon setting, several teams worked to specifically address some aspects of these issues by developing new or leveraging existing tools to suppress the contribution of these artifacts to the process of cell type classification and visualization of the resulting classes. The specific objectives of these teams were (1) to develop a process for automatically detecting image artifacts, with either a human‐in‐the‐loop or a completely automated process; (2) to suppress the effects of image artifacts on downstream analyses; (3) to correct the effects of lateral spillover on cell type classification; and (4) to optimize the automated visualization of high‐dimensional single‐cell data representative of distinct cell classes.
Strategies for artifact detection and correction. (A) Examples of common imaging artifacts in fluorescence microscopy. From left to right: antibody aggregates (bar = 100 μm), autofluorescent lint fibers (bar = 400 μm), air bubbles causing refractive index mismatch (bar = 125 μm), antibody hindrance (broad region of low antibody reactivity; bar = 400 μm), and out‐of‐focus tissue (bar = 50 μm). (B) CyCIF (cyclic immunofluorescence) datasets used for the artifact‐related hackathon challenges, featuring human colorectal cancer and tonsil tissue (bar = 1 mm). (C) A fibrous artifact and illumination errors are visible (left) and manually annotated (middle) to facilitate its detection and suppression (right). Scale bar represents 1 mm for all panels. (D) Receiver operating characteristic curve analysis for artifact detection performance of a multilayer perceptron trained on mean immunomarker signals alone (Features, FS1 in main text, left), or Features plus segmentation‐based nuclear morphology attributes (Nuc Morph) and pixel‐level image statistics (Pixel Thumb; FS3 in main text, right). Also see S4. (E) Comparison of before (left) and after (right) automatic artifact correction. Artifacts that have been significantly reduced or unresolved are shown with green or red boxes respectively. Regions without large artifact objects displayed similar intensity ranges across serial sections. Therefore, these either required minimal correction or were left unperturbed. Example regions highlighted with blue boxes. Scale bars represent 2 mm.
The participants were provided with CyCIF datasets representing two tissue types, colorectal cancer (Fig. 1B, left) and tonsil (Fig. 1B, right), acquired at 0.65 micrometers per pixel with a 20X/0.75NA objective lens. The colorectal cancer data was acquired as part of the HTAN efforts and consisted of 25 serial sections stained with 10 rounds of a 30‐plex CyCIF [23]. The tonsil dataset underwent 9 rounds of CyCIF and was used in the development of the Multiple‐choice microscopy pipeline [10] (MCMICRO). These datasets provide examples of the variety of imaging artifacts that plague samples even under the best conditions, providing a suitable resource for the development of tools to address the challenges listed above.
Automatic artifact detection from the spatial feature table
2.1
Experimental artifacts alter the natural dynamic range of signal intensities and cause false positive signals in derived single‐cell data (Fig. 1A). An ideal quality control (QC) tool for quantitative multiplex microscopy should automatically account for artifacts imposed at all stages in the data acquisition pipeline, from pre‐analytical variables such as biospecimen quality and tissue fixation conditions, to errors in tile stitching, imaging alignment, and cell segmentation. However, the majority of QC tools for digital pathology are limited to strategies for evaluating a subset of artifacts induced by sample preparation or data acquisition such as out‐of‐focus imaging, tissue degradation, and batch effect [24, 25, 26, 27, 28, 29]. They fail to account for downstream errors in image processing such as tile stitching, image registration, and cell segmentation. One such tool for identifying and removing artifacts in tissue‐derived single‐cell data from sample preparation to image acquisition and processing is CyLinter [30], a tool designed for human‐in‐the‐loop artifact curation. While manual artifact curation is effective for limited numbers of samples (<20) probed with relatively few immunomarkers (<20), it tends to scale poorly with dataset size. Thus, methods for automated detection of microscopy artifacts are needed to enhance workflow efficiency, minimize curator burden, and mitigate human bias.
This challenge asked participants to use machine learning methods to automatically detect visual artifacts in multiplex images (S3). Multiple supervised modeling approaches were assessed for their ability to automatically identify each of five different artifact categories routinely encountered in multiplex immunofluorescence (IF) images: (1) fluorescent contaminants, (2) uneven immunolabeling, (3) coverslip air bubbles, (4) slide debris, and (5) image blur (Fig. 1A). Ground truth annotations for the different artifact categories were generated through manual curation and provided to challenge participants, allowing them to establish three different feature sets (FS) for model training (S8). The first comprised per‐cell mean signal intensities from the colorectal cancer image, spanning 21 immunomarker channels plus Hoechst nuclear dye (FS1). The second feature set (FS2) combined the features from FS1 with an additional 7 nuclear morphology attributes derived from cell segmentation outlines. The third feature set (FS3) included all features from FS1 and FS2, along with 289 pixel‐level summary statistics calculated on segmented reference cells located at the center of 30 × 30‐pixel multi‐channel image patches cropped from the full image. FS3 also incorporated independent calculations of the same pixel‐level features from regions outside the reference cell segmentation boundaries.
Participants began by scaling the feature data using standard approaches and evaluating decision‐boundary classifiers such as linear and quadratic discriminant analysis, partial least squares‐discriminant analysis [31], and support vector machines [32], but these models were found to have poor predictive power when compared to ensemble models such as random forests [33], multilayer perceptrons [17], and boosted trees [34] (Fig. 1D; S4). Notably, the addition of nuclear morphology features (FS2) did not significantly improve any of the algorithms compared to the mean intensity feature table (FS1), whereas the addition of pixel‐level features (FS3) resulted in a dramatic improvement in accuracy (Fig. 1D; S4).
Artifact correction/suppression directly on images
2.2
Visual examination of digital images is often used to assess spatial patterns at cellular and sub‐cellular resolution. The artifact detection and removal using annotations described above does not alter the images themselves but rather the extracted information, and since artifacts generally appear as large/abundant high‐intensity objects, they can be distracting and lead to incorrect conclusions especially for audiences who are unfamiliar with biological samples or even fluorescence microscopy itself. Thus, correcting or suppressing these artifacts directly in the images would be useful, even if the artifact‐induced corruption of the underlying image data cannot be properly retrieved. To this end, different approaches to suppress artifacts were developed and assessed on ground truth annotations (Fig. 1C), including deep learning methods that ranged from image in‐painting using a pre‐trained model to generative adversarial networks [35]. The test data consisted of four neighboring registered serial section images with similar content that had not been used in any training sets. Accuracy was based on mean square error and peak‐signal‐to‐noise‐ratio between suppressed artifacts and a region free from artifacts from a neighboring serial section. The accuracy for the deep learning‐based methods was low‐to‐moderate due to the limited training data, although image in‐painting produced the best results from the supervised methods. Ultimately, a simpler and more accurate unsupervised approach was found. Artifact‐corrupted pixels were first identified by comparing outlier intensities within a local region from a neighboring section (Fig. 1E). If significantly different, the artifact‐corrupted pixels would have their values replaced from corresponding regions in the neighboring sections. Thus, in the absence of artifact annotations, it was still possible to obtain comparable accuracy to the deep learning image in‐painting approach without requiring huge investment in time and labor to curate training data.
Lateral spillover correction using REDSEA
2.3
Attribution of pixel intensities to different cells in digital images is highly dependent on segmentation accuracy, especially the location of a boundary between a cell and its neighbors (Fig. 2A). Inaccuracies in the segmentation boundaries can stem from multiple reasons, such as, cells having missing or overlapping nuclei due to the plane of cut; dimming of the signals by TSA amplification and noise in the image due to slide staining issues. Any errors along this boundary can result in spatial crosstalk of marker signals and lead to nonsensical cell types. The objective for the hackathon was to correct for lateral spillover in a publicly available dataset (CyCIF‐processed tonsil sections segmented with Deepcell [36]), assess the quality of the correction, and scale the method to larger image sizes. The approach taken by the participants leveraged a previously devised method called REinforcement Dynamic Spillover EliminAtion (REDSEA) [22]. REDSEA first computes the proportion of the shared boundary between a cell and its neighbors. It then compensates for signal intensity of each channel along that boundary based on the overall expression of that channel in the cell relative to its neighbors. The method was evaluated using sets of cell type markers that are known to be mutually exclusive. Due to the restricted time available during the hackathon, evaluation proceeded using five tiles each for two subsets of the tonsil dataset: 200 × 200 or 800 × 800 pixels. The original implementation of REDSEA was developed using high‐density tissue and was discovered to crash (with a divide by zero error) when the code encountered isolated cells (i.e., cells with no immediately adjacent cells) in the image. Most isolated cells occurred around the periphery of the image, and this was either due to an artifact of cell segmentation or image tiling. Hackathon participants enabled REDSEA to identify these isolated cells as a distinct cluster in high‐dimensional feature space, while removing them from consideration of lateral spillover (Fig. 2B). Co‐expression plots for pairs of mutually exclusive markers both before and after the modified implementation of REDSEA (S5) show that we were able to reproduce REDSEA's results while also increasing the proportion of single‐positive cells. This updated implementation has also been translated from MATLAB to Python to facilitate broader use (refer to S6 for code access link).
Spatial spillover and visual comparisons of cell type calling. (A) Example of spatial crosstalk of adjacent cells in CyCIF stained images of tonsil. Boundaries of cells identified by segmentation are indicated by the dashed cyan lines and distinct cells are numbered. Pixel intensities from different markers are indicated by distinct colors. Spatial spillover of CD3 into adjacent cells is indicated by cyan arrows. Note that the cell segmentation boundaries had been previously generated and used as‐is within the hackathon; it is possible that cells 3 & 4 may represent a single oversegmented cell. Scale bar represents 20 μm. (B) Uniform Manifold Approximation and Projection (UMAP) of cell features and spatial representation of cells in a 200 × 200px tile before and after reinforcement dynamic spillover elimination (REDSEA). A novel Cluster 5 identified by REDSEA captures isolated cells at the image border (indicated by triangles). (C) A traditional heatmap and (D) violin‐matrix of cell data separated into clusters using the hierarchical density‐based spatial clustering of applications with noise (HDBSCAN) [37] algorithm. (E) Visualizations generated by a web‐based interactive tool for inspecting and comparing clustered data in a spatial context. Scatterplots of cells in UMAP embeddings with cells colored by cluster membership based on the respective clustering algorithms (top row) and colored by silhouette coefficients (bottom row). The plots are synchronized in navigation (zooming, panning, selections).
Analysis of cell type classification
2.4
To visually and quantitatively assess cell type calling it is common practice to perform some type of clustering based on cell‐specific features. However, the lack of ground truth and the diversity of possible preprocessing inaccuracies make clustering quality hard to judge using statistical summary measures. For effective quality control and intervention, it is thus essential to integrate biomedical researchers into the analysis loop. This can be achieved through data visualization and interactive interfaces, allowing experts to inspect and compare outcomes and to make decisions on which algorithm and parameter settings perform the best. A challenge to these visualizations is that there are numerous algorithms and data reduction techniques that can be used to group objects (cells) using similarity measures, each with specific advantages and disadvantages. Likewise, there are many approaches for comparing clusterings [38, 39], but none of them can be considered standard.
Because there is no gold standard nor ground truth for assessing cell type clustering, the participants of the challenge sought to develop visualization tools that would aid users in selecting from several cell type clustering approaches. The participants considered two alternative strategies: (1) static graphical representations of cluster quality, and (2) an interactive web‐based platform for dynamic exploratory analyses.
The first approach involved creating a range of easily configurable summary heatmaps that provide more details on the clusters' quality while maintaining a high‐level overview of the entire dataset. Compared to traditional heatmaps (or matrices) [37] that show mean marker values per cluster of cells (Fig. 2C), violin‐matrices additionally display marker distributions for each cluster (Fig. 2D) as small multiples. Alternatively, marker intensity distribution within a cluster can be visualized in even more detail, on a single‐cell level, by subdividing each cluster's cells into thin bars, i.e., a heatmap (S1). To gauge cluster quality even further, a color encoding of computed silhouette scores enables rapid identification of clusters with, e.g., many cells of low silhouette score, a large variety of scores, or groups of low‐scoring instances within a cluster, indicating the cluster may be better split into smaller clusters. We also output the relative size of each cluster—as a percentage of cells—to more easily compare across methods (S9).
The second approach involved the development of a fully interactive web application to explore multiple aspects of the data. The application allows inspection and simultaneous comparison of different clustering algorithms (k‐means, density‐based, Gaussian mixture models, self‐ordered maps, and Leiden) side‐by‐side in a spatial context, with coordinated zooming and panning of subregions within the plots, and coloring individual cells by cluster identity, expression value, or silhouette coefficients (Fig. 2E). The web application scales to displaying large datasets with two or three uniform manifold approximation and projection (UMAP) axes and real‐time adjustment of viewing angles and zoom ranges. This was achieved by utilizing scatter‐gl (refer to S6 for code access link), a webgl‐accelerated 2D and 3D scatter plot point renderer that is part of Tensorflow's Embedding Projector [40]. The resulting tools are available on GitHub [41] and can be installed on most personal computers.
Summary, future work, and remaining challenges to identifying and classifying cell types
2.5
Significant strides have been made in the development of robust image analysis tools that can effectively handle segmentation errors and image artifacts. However, further advancements are necessary to ensure their widespread effectiveness. One major challenge is the requirement for more training data with well‐established ground truth to thoroughly evaluate and improve each implementation's accuracy and generalizability. This is particularly true for the multilayer perceptron model, which was identified as the optimal solution for automatic detection of artifacts. With all deep learning models, additional training tends to improve their accuracy with the added costs of time and computational resources. An intriguing prospect lies in coupling automatic artifact detection with the ability to automatically correct these artifacts within the images. This integrated approach could provide substantial benefits, streamlining the analysis workflow and increasing overall accuracy.
To enhance the overall utility of each of these tools, it is crucial to improve their scalability and automation capabilities, as well as to evaluate their performance across diverse datasets encompassing various imaging modalities, resolutions, tissues, and segmentation methods to determine ideal conditions suited for each tool. Specifically, toward addressing lateral spillover with REDSEA, the incorporation of non‐membrane markers may reduce the reliance on manual parameter tuning.
Tools to rapidly evaluate and compare various clustering algorithms and the underlying feature data that give rise to the different resulting clusters are needed. This is true not only for imaging data but for high‐dimensional clustering, in general, including single‐cell RNA sequencing data [42, 43, 44, 45]. Silhouette‐heatmaps offer one solution for static visualization, but other metrics and visualizations may also be considered. Additionally, different cluster quality measures could be incorporated into the web‐based visualization tool to aid in the assessment of clustering outcomes.
Image representation learning
3
Emerging multiplexed imaging technologies create images consisting of a large number of markers and provide high‐dimensional single‐cell feature tables, but analyzing single‐cell multiplex imaging data is still primarily limited to extracting a single mean intensity value per channel, per cell. This classical image feature extraction approach is biased toward known and easily measured features, does not fully leverage multiplex imaging information, and can miss subtle but important sub‐cellular features, such as marker polarity and staining colocalizations across markers that might indicate divergent cell states. Improving the number, nuance, and kinds of features extracted from multiplex imaging will improve phenotyping and give researchers a better understanding of cell states, cell population heterogeneity, cell–cell communication, and intercellular regions. However, the financial and temporal costs of multiplexed imaging approaches are higher compared to the traditional imaging modalities, such as hematoxylin and eosin (H&E) staining and brightfield microscopy. To bridge the gap, recently proposed deep learning approaches can extract relevant morphological features from low‐dimensional (e.g., H&E stained) images that are not easily perceived by the human eye and use those features to infer the expression of molecular markers; these approaches are collectively referred to as virtual staining [46, 47, 48]. It remains a challenge to provide such data for human visualization in a convenient and browsable manner. Although methods for photography, brightfield imaging, and low‐plex immunofluorescence data generate RGB thumbnails by downsizing and cropping, these are not immediately generalizable for data with 100 channels without losing interpretability. Here we discuss methods for feature extraction using variational autoencoders (VAEs), automated summarizing of high‐dimensional data with thumbnail images, and predicting marker staining intensities from H&E images.
Feature extraction with VAEs
3.1
VAEs have previously been successfully trained on various biomedical data modalities such as bulk and single‐cell gene expression, and imaging data [49]. A potential challenge to VAE feature extraction from single‐cell imaging data is an abundance of unimportant or uninformative morphological features driving differences between biologically similar images and skewing the results in undesired ways [11, 50].
Hackathon participants trained several forms of VAEs, including standard VAE, β‐VAE [51], invariant conditional VAE (C‐VAE) [52], and Multi‐Encoder VAE (ME‐VAE) [11], on immunofluorescence images of human mammary MCF10A cells treated with TGF‐β or PBS (control) and evaluated their ability to separate biologically distinct cell populations based on latent features as well as standard morphological and spatial features. The participants found that most VAEs were confounded by uninformative features, with the exception of ME‐VAE, which showed strong discriminatory performance in an unsupervised setting, improved further by subsetting the latent space to the top 10 most highly variable features (Fig. 3A).
Image representation learning by VAEs and for thumbnail generation. (A) Each implementation of VAE was qualitatively assessed for their ability to distinguish control (phosphate‐buffered saline, PBS)‐treated from transforming growth factor (TGF)‐β‐treated MCF10A cells using all morpho‐spatial features or the top 10 variable (var) features compared to preselecting the top 10 discriminatory (discr) features extracted from the images. Feature space is reduced to two dimensions using UMAP embedding. Class labels of TGF‐β‐ or PBS‐treated cells are shown in pink and blue, respectively. (B) Example thumbnail images. Each panel shows a thumbnail (or associated comparative plot) generated by the methods described in the main text (panel labels). All approaches were applied to a 0.9 mm2 (9 megapixels) 9‐channel cyclic immunofluorescence (CyCIF) image of a human tonsil germinal center. Scale bars represent 100 μm in all images.
Thumbnail generation
3.2
Thumbnails are scaled down representations of larger images that enable easier viewing, and faster access, storage, and management. Since these are miniature versions of large images, challenges arise in techniques related to downscaling, compressing, and resizing these larger images, while taking into account both noise and the higher number of multiplex channels.
Four different methods were developed to compress highly multiplexed images into three‐color (e.g., RGB) thumbnails to facilitate interpretation. These made use of two pre‐existing tools as starting points: Auto‐Minerva (refer to S6 for code access link), which uses a GMM to isolate tissue foreground signal, and Miniature [53], which employs UMAP dimensionality reduction on image pixels to embed images in the CIE LCh° color space while ensuring image regions with similar protein expression patterns receive similar colors. The methods were evaluated on three CyCIF datasets that included two whole‐slide images of colorectal cancer and healthy tonsil, and one tissue microarray [10], which was acquired at 0.65 microns per pixel with a 20×/0.75NA objective lens.
The first method, called MIC (Marker Importance in Cluster; panel B in Fig. 3), builds on Miniature [53] by performing k‐means clustering on the UMAP embedding, followed by calculating a variable importance metric, which can be displayed in a compact heatmap, enabling rapid assessment of markers driving heterogeneity within an image. The heatmap is complementary to a thumbnail and aids its interpretation. In the tonsil CyCIF dataset, it was seen that the three clusters were driven primarily by Keratin, α‐SMA and Ki‐67. The next method, Max/Top3, explored a maximum intensity projection approach taking the top three most highly expressed channels at each pixel and assigning them to the red, green, and blue channels of the thumbnail. HClust is a hierarchical clustering approach to find three informative channel groups per image, within which intensities are aggregated and colored in the CIE LCh° color space. Several aggregation and color assignment methods were explored including the group hue, pixel maximum, and pixel sum, in combination with the cut point of the dendrogram. Finally, Animate generates an animation that cycles through all channels after downsizing and auto‐thresholding.
Virtual immunofluorescence staining
3.3
Although multiplex immunofluorescence staining can provide deep insights into quantitative and spatial aspects of cancerous tissues, it can be prohibitively expensive and time consuming, and repetitive processing steps risk introducing artifacts, such as tissue degradation during later cycles. Alternatively, it may be possible to infer the expression of specific protein markers from tissue autofluorescence using image‐to‐image translation, i.e., virtual staining. Ideally, by leveraging the endogenous autofluorescence of human tissues, these techniques can recreate pathology images without requiring arduous chemical staining procedures [48, 54] and can therefore expand the utility of digital pathology. An extension of this approach is the prediction of specific marker expression from unstained images—virtual IF [46, 48].
The objectives for the hackathon were to use the speedy histological‐to‐immunofluorescent translation (SHIFT) implementation [55] of the pix2pix [56] conditional generative adversarial network model to translate label‐free autofluorescence to immunofluorescence signals. The training data were whole‐slide images of healthy human kidney tissue samples with three channels of autofluorescence images and 10 different immunofluorescence stains of kidney‐specific markers acquired on the same tissue section. Training SHIFT to predict the proximal tubule marker AQP1 from the label‐free autofluorescence images produced a virtually stained image that, when analyzed for single‐cell expression of AQP1, had correlated levels of predicted expression (Pearson ρ = 0.48; S10) to those measured on the neighboring section stained for AQP1. However, the low autofluorescence detected in the nuclei was insufficient for SHIFT to predict nuclear staining.
Summary, future work, and open questions of image representation learning
3.4
VAEs are powerful tools for extracting and representing latent variables from high‐dimensional images and enabling interpretation of the learned representations. However, applying VAEs to extract biologically meaningful information from single‐cell imaging is frequently driven by unimportant or uninformative features. By using a multiple‐encoder VAE and focusing on the top 10 most variable features of images, hackathon participants were able to substantially increase the discriminatory power of VAEs. While methods like ME‐VAE can learn representations without explicit labels, more effective unsupervised learning algorithms that can discover informative features without relying on prior knowledge remain an open challenge. Representation learning methods that can capture meaningful and transferable features that generalize well across different tissue types, diseases, and multiplex imaging platforms may be useful in this regard. Ideally, the learned representations will be biologically interpretable and meaningful, which would add trust in these machine learning approaches. The development of representation learning methods that produce interpretable and explainable features is an intensely active area of research and progress along this front should provide powerful new tools for cancer image analysis.
Although several methods of thumbnail generation and channel‐to‐color associations were developed, their suitability across applications will differ. These applications may include interactive data analysis and visualization (such as Mistic [57]), online data portals [such as those belonging to HTAN, HuBMAP, and NCI's Imaging Data Commons], and local file browsers. However, any use case is likely to have a unique feature set that would require prioritizing one thumbnail generation method over another. Thus, thorough testing will be required to assess whether specific tasks can be completed with more ease and accuracy.
The automated translation of one domain (e.g., autofluorescence images) to another (e.g., immunofluorescence images) is not yet easily implemented, especially when compared to deep learning approaches for segmentation. Fine tuning of models remains critical to their successful implementation. However, success remains highly subjective, based primarily on visual inspection. An alternative approach that would provide quantitative evidence to support “success” would be to compare the outcome of downstream analyses, like cell segmentation, between a virtual and true immunofluorescence image. Many downstream analyses that currently rely on true fluorescence images could act as useful quantitative metrics for determining the accuracy of virtual staining.
Image processing at scale
4
Modern highly‐multiplexed imaging methods are capable of producing terabyte‐scale datasets, with individual images requiring dozens of gigabytes of storage and extensive resources for processing whole‐slide images in an end‐to‐end pipeline. The scale of today's images poses substantial challenges for applying existing methods originally designed on smaller datasets. Hackathon challenges designed to address this theme included optimization of existing methods to increase throughput efficiency, end‐to‐end image analysis with Galaxy [58, 59, 60], and importing tissue volumes into Neuroglancer [61], an open‐source volume visualization tool developed by Google.
Deploying image segmentation at scale
4.1
The effective application of image analysis at scale requires a multi‐pronged approach with advancements in three areas: algorithm optimization, identification and exploitation of computational parallelism, and scalable deployment on cloud infrastructure. The need for interactive visualization and human‐in‐the‐loop workflows complicates all three areas. This stands in contrast to other molecular data modalities, such as RNA sequencing, where operations like sequence alignment and variant calling can be carried out without continuous integration with data visualization and user feedback [62, 63].
Hackathon participants examined CellPose [9], a popular neural‐net‐based cell segmentation method, for ways in which its processing efficiency could be improved for applications to large datasets. Because localization of cells in images is a central step in any cancer image analysis pipeline, inefficient cell segmentation methods can present a substantial bottleneck. The participants first profiled individual aspects of the method (file input/output speeds (I/O), neural net inference, and post‐processing of model outputs) to determine computational bottlenecks. The participants also measured how the run time scales with image size and attempted to identify opportunities for parallelization. All analyses were carried out using a CyCIF tissue microarray dataset [10]. Participants determined that file I/O was not a significant bottleneck and that over ⅔ of runtime was devoted to neural network inference, which scaled linearly with image size within the limits of available memory. Graphics processing unit‐based computation within CellPose resulted in a 17.6‐fold improvement in runtime compared to central processing unit‐based processing. A further four‐fold reduction in runtime was achieved by removing CellPose's redundant models that were all trained identically but initialized from a different set of random weights. Removing these was not detrimental to segmentation accuracy, which aligns with the general view that an effective ensemble should contain weak, complementary predictors [64].
End‐to‐end image analysis with galaxy
4.2
Processing image datasets using scalable and standardized workflows enables reproducible analysis and provides harmonized and comparable results across imaging modalities. Galaxy is a web‐based platform widely used for large‐scale data processing and analysis [58, 59, 60] and is compatible with multiple computing infrastructures, which allows it to leverage cloud computing resources and Unix/Linux‐based high‐performance computer clusters to power large‐scale data analysis. By coupling image processing, analysis, and visualization, the entire workflow can be executed remotely with no need for downloading large files locally or moving data to external software. The Galaxy‐MTI tool suite has recently been integrated with the core MCMICRO [10] pipeline to enable automated and reproducible image processing and analysis but has not yet been broadly deployed. The objectives of hackathon participants were to expand the Galaxy/MCMICRO workflow to function with imaging modalities beyond CyCIF. This included preprocessing and registration of multiplexed immunohistochemistry images using PALOM [65] and comparing the segmentation output of two different algorithms (StarDist [66, 67] and Ilastik [68]) within Galaxy.
CyCIF datasets from a breast tissue microarray and tonsil, both sampled at 0.65 micrometers/pixel, were used as reference images on which the various processing steps were performed. The hackathon participants developed several pipelines and extensions to the software suite, including adding several image preprocessing/registration and segmentation algorithms, developing a new workflow optimized for non‐fluorescence‐based multiplex tissue imaging platforms, integrating PALOM for piecewise alignment of images [65], and attempting to deploy StarDist [66, 67] for 2D segmentation of CyCIF images. StarDist was mostly implemented but resulted in errors as different steps in the pipeline that require further development. To facilitate portability, all tools were built within Docker images.
Scalable visualization of 3D data using Neuroglancer
4.3
The challenge objectives were to import, process, render, and navigate multi‐channel 3D volumetric datasets in Neuroglancer [61]. Participants were given 3D datasets of highly multiplexed CyCIF [8] melanoma datasets, each spanning 500 × 500 × 5 μm, from the Pre‐Cancer Atlas. Each region was sampled at 108 nm lateral and 200 nm axial resolution. These samples displayed a wide array of histopathological features associated with early‐stage melanomas including brisk immune infiltrates, tumor regression, and pagetoid spread.
Participants converted each channel in the data into a separate 3D segmentation mask using global thresholding or machine‐learning‐based pixel classification methods. From here, the segmentation masks were converted to meshes using marching cubes [61, 69]. The meshes could then be loaded and displayed in Neuroglancer. To simultaneously look at multiple channels, participants loaded multiple meshes into a single view and overlaid them with the ability to enable or disable individual channels. In addition to 3D surface rendering, participants used a 2D slice view to render a specific slice of the multi‐volume selectable by scrolling through the third axis. The 2D view shows the precise image data but allows overlaying the segmentation mask in semi‐transparent layers (Fig. 4). Both the 2D and 3D views can be interactively linked for synchronous navigation.
Data processing and visualization pipeline developed during the challenge for Neuroglancer. Highly multiplexed cyclic immunofluorescence (CyCIF) data are stored as multi‐channel imaging volumes (top, left), where each volume represents one channel. For simplicity, volumes are depicted as single slices in this figure. Each volume is segmented, either via thresholding or more complex machine learning approaches and stored as binary segmentation volume (top, middle). Subsequently, for each segmentation volume (i.e., segmented channel) the geometry of the segmented structures is extracted and stored as a geometry mesh for subsequent three‐dimensional (3D) surface rendering (top, right). The visualization pipeline supports a slice view that can combine an original imaging volume with several segmentation volumes (bottom, left) and a 3D view (bottom, right). The 3D view can represent the volume as extracted meshes or a clipping plane. All scale bars represent 50 μm except in the slice view visualization where it represents 20 μm.
Summary, future work, and open questions
4.4
The incorporation of new tools in Galaxy is relatively straightforward due to its highly modular architecture and the use of Docker containers. However, the generation of Docker containers that are fully compatible with the Galaxy system and enable full functionality of the embedded resources required diligence. Although progress was made along multiple fronts to develop new Galaxy pipelines, none of them have been thoroughly tested and hardened. Future efforts will specifically focus on enabling users of Galaxy to apply the new implementations to various datasets to ensure their proper functionality and on training new neural net (or other deep learning) models for segmentation tasks that can be deployed more broadly.
Many of the tools and platforms are optimized for fluorescence‐based microscopy platforms, but there is a growing need to expand software tools to flexibly handle multiple imaging modalities.
Additional enhancements may include more sophisticated image registration and image preprocessing methods, robust testing, and more efficient resource allocation. Neuroglancer was found to be suitable for scaling up multi‐channel 3D datasets. However, it is still dependent on provision of high‐quality 3D segmentation masks especially where there are highly intricate biological structures. To make Neuroglancer more computationally efficient, one could adopt the more computationally efficient webGPU API and use more scalable data formats like Neuroglancer Precomputed that negate the need of loading the entire image volume into working memory (data access link provided in S6).
Overall summary
5
The preparation for the IAWG hackathon involved a critical review and summary of the many remaining challenges to automating image processing and analysis for digital pathology, some of which we attempted to specifically address within the hackathon. While the grouping of the challenges into themes was primarily out of convenience, the three main categories appear particularly relevant to most approaches that attempt to extract quantitative single‐cell measurements from the numerous techniques that have been developed to generate high‐dimensional multiplexed images of cancerous tissues. The first theme addressed challenges to cell type classification and assessment and included the detection and removal of technical artifacts from images and downstream analyses, and how to visually and quantitatively assess cell classifications. Since it is impossible to prevent all artifacts from entering into the process, it is important to acknowledge their potential for introducing errors into downstream analyses. Having tools to automatically and accurately identify these potential sources of error will enable the quantification of their potential contribution to errors in interpretation and ideally make their effects negligible.
The second theme related to translation and visual representation of spatial aspects of high‐dimensional imaging data. While the approaches considered during the hackathon focused on single‐cell variability, their further expansion to cell neighborhood and tissue‐level spatial information, as well as integration with other data modalities, will be of high importance to a better understanding of cancer. The techniques described here generally focus on translating or reducing the information from a particular modality into another domain with the primary objectives of identifying and understanding the relationships between the domains and to facilitate their visual interpretation. We expect standardizing and hardening these approaches will enhance our ability to interpret large datasets in an automated manner.
The last theme dealt with the challenges of big data and the scaling of digital image analyses to accommodate them. Prior focus on some steps in the image processing and analysis pipeline, such as cell segmentation using CellPose [9], appears to have succeeded in scaling well with increasing data. However, the quality and accuracy of cell segmentation still requires substantial optimization for different datasets, which dramatically reduces the computational efficiency gains. Resources like Galaxy [59] can assist with the process of optimizing the steps by providing a platform for executing reproducible pipelines, including quality control tests, using a multitude of computational tools in parallel. We strongly support the use of tools that enhance reproducibility and enable cross‐compatibility, especially with the abundance of different platforms and data types associated with cancer tissue imaging.
The virtual hackathon held in 2022 provided our community with the ability to make progress toward multiple challenges to fully automating the analysis of highly multiplexed tissue data—a lofty and possibly unattainable goal. However, there still remain additional open questions and areas of research that will only be amplified by the advancement of high‐throughput multiplexed imaging, computational methods, and machine learning architectures. Some of these challenges include: how best to integrate spatial imaging data from fixed tissues with other single‐cell modalities such as single‐cell sequencing data; how to analyze and provide broader access to large 3D imaging datasets; how to interpret results from spatial and neighborhood analyses, such as those applied to studying tumor‐immune microenvironment interactions; and how to link the static fixed tissue imaging data to our understanding of the highly dynamic evolutionary process that is cancer.
Conflict of interest
The authors declare no conflict of interest.
Author contributions
A list of authors and non‐author contributors (hackathon participants), including affiliations, roles, and ORCID, is provided in S7.
Supporting information
S1. Resources developed within the Image Analysis Hackathon 2022 addressed different aspects of the typical image analysis pipeline for fixed cancer tissues.
S2. Overview of approach to address challenges to automatic artifact detection.
S3. Artifact Classifier Performance.
S4. Comparison of cell data before and after application of REDSEA.
S5. Detailed heatmap for marker expressions at single‐cell resolution with color bars representing silhouette coefficients and cluster memberships.
S6. Details of the hackathon, Criteria for team presentations of hackathon performance, List of code repositories (web links), List of abbreviations and acronyms.
S7. Complete contributor list, including all hackathon participants.
S8. List of features (FS1, FS2, FS3) from the Automatic detection of artifacts from spatial feature table challenge used for model training to detect artifacts. A suffix of _c and _b in the feature set name indicates whether it considers pixels from within the nuclei segmentation boundary or the outside (background) respectively.
S9. Percentage of cells for each cell cluster across the different clustering methods used in the Analysis of cell type classification challenge.
S10. Demonstration of virtual staining of AQP1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Wagner RP . Rudolph Virchow and the genetic basis of somatic ecology. Genetics. 1999;151(3):917–920.10049910 10.1093/genetics/151.3.917PMC 1460541 · doi ↗ · pubmed ↗
- 2Hajdu SI . A note from history: landmarks in history of cancer, part 4. Cancer. 2012;118(20):4914–4928.22415734 10.1002/cncr.27509 · doi ↗ · pubmed ↗
- 3The human body at cellular resolution: The NIH human biomolecular atlas program. Nature. 2019;574(7777):187–192.31597973 10.1038/s 41586-019-1629-x PMC 6800388 · doi ↗ · pubmed ↗
- 4Smith JM , Conroy RM . The NIH common fund human biomolecular atlas program (Hu BMAP): building a framework for mapping the human body. FASEB J. 2018;32:818.
- 5Regev A , Teichmann SA , Lander ES , Amit I , Benoist C , Birney E , et al. The human cell atlas. e Life. 2017;6:e 27041.29206104 10.7554/e Life.27041 PMC 5762154 · doi ↗ · pubmed ↗
- 6Rozenblatt‐Rosen O , Stubbington MJ , Regev A , Teichmann SA . The human cell atlas: from vision to reality. Nature. 2017;550(7677):451–453.29072289 10.1038/550451 a · doi ↗ · pubmed ↗
- 7Osumi‐Sutherland D , Xu C , Keays M , Levine AP , Kharchenko PV , Regev A , et al. Cell type ontologies of the human cell atlas. Nat Cell Biol. 2021;23(11):1129–1135.34750578 10.1038/s 41556-021-00787-7 · doi ↗ · pubmed ↗
- 8Lin JR , Izar B , Wang S , Yapp C , Mei S , Shah PM , et al. Highly multiplexed immunofluorescence imaging of human tissues and tumors using t‐Cy CIF and conventional optical microscopes. e Life. 2018;7:e 31657. 10.7554/e Life.31657 29993362 PMC 6075866 · doi ↗ · pubmed ↗
