SM3DD with segmented PCA: a comprehensive method for interpreting 3D spatial transcriptomics
Tony Blick, Aaron Kilgallon, James Monkman, Caroline Cooper, Chin Wee Tan, Emily E Killingbeck, Liuliu Pan, Youngmi Kim, Yan Liang, Andy Nam, Michael Leon, Paulo S F Guimaraes, Seigo Nagashima, Ana P C Martins, Cleber Machado-Souza, Lucia de Noronha, John F Fraser

TL;DR
The paper introduces SM3DD, a new method for analyzing 3D spatial RNA data to compare lung tissues from healthy individuals and SARS-CoV-2 patients.
Contribution
SM3DD is a novel, segmentation-free approach that enables pathway discovery in spatial transcriptomics without requiring cell annotation.
Findings
SM3DD identified significant gene distance changes in SARS-CoV-2 patients compared to normal individuals.
Hierarchical clustering of SM3DD results grouped genes by function, aiding biological interpretation.
Segmented PCA revealed pathways like 'SARS-CoV-2 infection' despite no viral transcripts being measured.
Abstract
We developed Standardised Minimum 3D Distance (SM3DD), an entirely cell segmentation/annotation-free approach to the analysis of spatial RNA datasets, using it to compare lung tissue from 16 clinically normal individuals to that of 18 SARS-CoV-2 patients who died from acute respiratory distress syndrome. RNA spatial coordinates were determined using the CosMx™ Spatial Molecular Imager (Bruker Spatial Biology, US). For each individual transcript location, we calculated the three-dimensional distances to the nearest transcript of each transcript type, standardising the distances to each transcript type. Mean SM3DDs were compared between normal and SARS-CoV-2 patients. Notably, hierarchical clustering of the directional log10(P) values organized genes by functionality, making it easier to interpret biological contexts, and for FKBP11, where a decrease in distance to MZT2A was the most…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —Wesley Research Institute10.13039/100008613
- —Australian Academy of Sciences
- —University of Queensland10.13039/501100001794
- —Medical Research Future Fund10.13039/501100025520
- —Queensland Children's Hospital10.13039/100011302
- —Richie's Rainbow Foundation
- —Cooper Rice-Brading Foundation, Australia
- —PA Research Foundation10.13039/100009844
- —National Breast Cancer Foundation10.13039/501100001026
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Immune responses and vaccinations · RNA regulation and disease
Introduction
Acute respiratory distress syndrome (ARDS) is, unsurprisingly, a serious complication of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. A global survey of ARDS incidence and outcomes in hospitalised SARS-CoV-2 patients found that, in a period prior to widespread vaccine use from the beginning of the pandemic through to the end of July 2020, approximately one third developed ARDS, and a mortality rate among ARDS-affected SARS-CoV-2 patients of 45% [1]. Pulmonary damage from SARS-CoV-2 occurs through local and systemic inflammatory host responses, as well as through direct viral insult, with autopsy studies reporting alveolar damage and flooding, platelet microthrombi, and extensive remodelling, along with fibrotic evolution and variable collagen fibre deposition [2].
Spatially resolved transcriptomics, Method of the Year 2020, along with Spatial Proteomics, Method of the Year 2024, have transformed the analysis of tissue samples [3, 4]. Currently, analysis methods for spatially resolved datasets predominantly rely on cell segmentation algorithms that are either geometry- or deep learning-based, but do not make use of transcriptomic data. These approaches rely heavily on nuclear detection with diminishing accuracy for tissue samples exhibiting a high degree of heterogeneity in cell sizes, shapes, orientations, and densities within the extracellular matrix (ECM). Amongst other concerns, inaccurate cell segmentation can lead to mixed-cell expression profiles and subsequently errors in both cell type annotation and cell type-specific differential expression analysis.
There are currently few published methods for the analysis of large spatially resolved transcriptomic datasets that don’t rely on spatial restriction approaches to cell segmentation prior to the assignment of expression data to cells. Spot-based Spatial cell-type Analysis by Multidimensional mRNA density estimation (SSAM) instead assigns cell types at a pixel level based on a sliding window analysis of the spatially resolved transcriptomic data, resulting in improved cell type detection and assignment [5]. Factor Inference of Cartographic Transcriptome at Ultra-high Resolution (FICTURE) also makes use of the spatial expression data to improve cell segmentation accuracy, deriving fine-scale tissue structure by identifying factors derived from gene expression patterns that fit to a Latent Dirichlet Allocation (LDA) model and adaptively aggregating pixel-level information using anchors [6]. The graph neural network model GraphSAGE [7] has been applied without cell segmentation to a pulmonary fibrosis spatial RNA dataset to describe molecular niches [8].
We present here Standardised Minimum 3D Distance (SM3DD), a novel cell segmentation/annotation-free analytical methodology for large spatially resolved transcriptomic datasets, which makes full use of the three-dimensional nature of each field of view (FOV) within the datasets, a benefit over current analysis approaches that flatten the data by ignoring transcript Z-coordinates. Simplistically, within each FOV, the three-dimensional distance to the nearest transcript of each transcript type is calculated for every transcript location in the dataset. Within each FOV, the distances to each transcript type are standardized. For each sample, the mean SM3DD for each directional transcript type-to-transcript type pair is calculated, prior to a comparison between sample groups. Additionally, a principal components analysis (PCA) of the entirety of the SM3DDs is performed, segmenting the analysis according to the transcript type from which the 3D distances were determined, and mean differences in dimension scores across the PCA segments calculated, at which stage pathway-level analysis can be applied. We used SM3DD to compare RNA spatial coordinates, generated using NanoString’s CosMx™ Spatial Molecular Imager 1000-plex assay [9], of lung tissue from 16 normal cases to that from 18 SARS-CoV-2 patients who died from ARDS. We further demonstrate the broad utility of SM3DD by applying it to a publicly-available Xenium 343-plex spatial RNA dataset for matched fibrotic regions from 8 patients with pulmonary fibrosis [8]. We demonstrate that SM3DD facilitates detailed pathway-level interpretation of spatial transcriptomic data.
Material & Methods
CosMx™ Spatial Molecular Image analysis of tissue microarrays
Tissue blocks of rapid autopsy lung samples from 18 SARS-CoV-2 patients, confirmed by RTqPCR of nasopharyngeal swabs, who died from ARDS, were reviewed by an anatomical pathologist and 2 tissue microarrays (TMAs) constructed from 30 lung tissue cores. Autopsy samples were obtained from the Pontificia Universidade Catolica do Parana PUCPR the National Commission for Research Ethics (ethics protocol number 3.944.734/2020), with family-permitted post-mortem biopsies and ratification by the University of Queensland Human Research Ethics Committee. A normal lung TMA (‘LCN241’, US Biomax, USA) was included as a comparator sample. Adjacent serial tissue sections from all TMAs were profiled using the CosMx™ Spatial Molecular Imager (SMI) 1000-plex assay (Bruker Spatial Biology, USA). The volumetric space of each FOV was 656 μM x 985 μM x 6.4 μM. For point of reference, the volumetric depth spans the equivalent of 36 pixel-widths. CosMx SMI assigned transcript Z-coordinates to 9 evenly-spaced discrete layers across the 6.4 μM. FOVs were in matching locations across adjacent serial tissue sections.
Calculation of SM3DD
Within each FOV, for all transcript locations, the spatial coordinates, adjusted for Z scaling, were used to calculate the 3D distances to the nearest transcript of each transcript type. These distances were standardised within each FOV to correct for overall expression level differences and signal strength effects by dividing the distances to each transcript type by the mean of the distances to that transcript type. Z scale adjustment is required for SMI data, as the Z coordinate data is given as an image layer number rather than as a distance and involves multiplying by the ratio of the known physical distance between layers and the pixel width. Z-coordinates have a continuous distribution in Xenium datasets and do not require Z scale adjustment. The method does not filter for sparse transcripts, while directional transcript distances to sparse transcript types are unlikely to be informative, directional transcript distances from sparse transcript types are potentially biologically informative.
Mean SM3DD comparison
For all FOVs from each sample, across both same and adjacent sections, the SM3DDs calculated within individual FOVs were pooled, and sample-level means for each transcript type-to-transcript type SM3DD were determined. Differences between mean SM3DDs for sample groups were assessed by t-test, with the false discovery rate (FDR) controlled by the two-stage Benjamini, Krieger, and Yekutieli procedure [10], implemented by the MATLAB function fdr_bh [11]. While the procedure is only guaranteed to control the FDR for independent tests, simulations indicate that it can also control the FDR of positively correlated tests. Directional log10(P values) were displayed as a clustered heatmap, with ‘from’ and ‘to’ transcript type on alternate axes. For prominent clusters of statistically shorter mean SM3DDs, sets of clustered genes were assessed by Fisher’s Exact Test with FDR control for pathway-level effects, utilising GeneCard Suite’s PathCard database [12]. PathCards is a consolidation of pathways from twelve databases: Reactome, KEGG, PharmGKB, WikiPathways, QIAGEN, HumanCyc, Pathway Interaction, Tocris Bioscience, GeneGO, Cell Signaling Technologies, R&D Systems, and Sino Biological.
Segmented PCA
A PCA approach was implemented, segmenting the analysis of all SM3DDs according to the transcript type from which they were determined. For each transcript-specific segment, the difference in means of the PCA scores between transcripts in each sample group was determined within each PCA dimension. These transcript-specific differences were then grouped by PCA dimension and assessed for pathway-level effects. Across each PCA dimension, the transcript-specific differences for genes in each PathCard SuperPathway, a composite of one or more pathways from other databases, were compared by t-test to those for all other genes. The FDR was controlled by the two-stage Benjamini, Krieger, and Yekutieli procedure, both across each PCA dimension and, within each SuperPathway, across all PCA dimensions.
Computational Resources
SM3DD can be run on standalone computers. On an Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz, 16 cores, 256 Gb, SM3DD took approximately 5 days to complete the SARS-CoV-2 analysis. As SM3DD uses parallel processing, the computation time on an HPC would be markedly faster, primarily dependent upon the number of available cores.
Results
CosMx™ Spatial Molecular Image analysis of tissue microarrays
TMAs were constructed from 30 lung tissue cores obtained at autopsy from 18 SARS-CoV-2 patients who died from ARDS. A normal lung TMA (‘LCN241’, US Biomax, USA) was included as a comparator sample. Adjacent serial tissue sections from all TMAs were profiled using the CosMx™ Spatial Molecular Imager (SMI) 1000-plex assay (Bruker Spatial Biology, USA), with data collected from a total of 116 FOVs from the SARS-CoV-2 patient lung samples and 38 FOVs across 16 independent normal lung samples. Within each section, data were collected from two non-overlapping FOVs for each core. Four of the FOVs from one core were excluded from the analysis due to low transcript counts, resulting in a final total of 98 815 173 transcripts.
Calculation of SM3DD
Within each FOV, for all transcript locations, the 3D distances to the nearest transcript of each transcript type were calculated, resulting in 96 838 869 540 directional transcript-to-transcript distances. These distances were standardised within each FOV to correct for overall expression level differences and signal strength effects (Fig. 1). Transcript distances are only calculated within individual FOVs; they are not calculated across matching FOVs from the same sample, either on the same or adjacent sections.
Method overview. The 3D distances to the nearest transcript of each transcript type (M3DD) are determined for all transcript locations. These distances are standardised (SM3DD) within each FOV by dividing by the mean of the distance to each transcript type. Sample means are determined and compared between groups by t-test, and the directional log10 (P) values are clustered. Prominent clusters are assessed by Fisher’s Exact Test for pathway-level effects. SM3DDs are also segmented according to the transcript type from which they were determined, and PCA is performed. Within each PCA dimension, differences in mean PCA scores between groups are calculated for each transcript, and these differences are assessed by t-test for pathway-level effects.
Mean SM3DD comparison between SARS-CoV-2 and normal
For all FOVs from each sample, across both same-slide and adjacent sections, the SM3DDs calculated within individual FOVs were pooled, and sample-level means for each transcript type-to-transcript type SM3DD were determined. Differences between mean SM3DDs for SARS-CoV-2 and normal samples were assessed, with false discovery rate (FDR) control. Approximately 45% (431 569) of all transcript type-to-transcript type mean SM3DDs differed between SARS-CoV-2 and normal samples for an FDR cut-off of 5% (Fig. 2A). Statistically, the most marked difference was the overall shorter FKBP prolyl isomerase (FKBP11)-to-mitotic spindle organizing protein 2A (MZT2A) transcript SM3DD in SARS-CoV-2 samples (P = 1.06 × 10^−17^). Across all mean SM3DDs specifically to MZT2A, this difference was very distinct (Supplementary Fig. S1). An overall shorter mean SM3DD to FKBP11 transcripts from nuclear paraspeckle assembly transcript 1 (NEAT1) was statistically ranked 13^th^ (P = 2.04 × 10^−13^), which is notable as NEAT1 is non-coding and nuclear localised.
Comparison of transcript distances between normal and SARS-CoV-2-infected lungs. (A) Volcano plot of all directional transcript distance comparisons. The x-axis is the distance ratio, where larger values represent shorter distances in SARS-CoV-2-infected lungs. The y-axes are negative log10 (P values). (B) Clustering of directional log10 (P values) from comparison of transcript distances between normal and SARS-CoV-2-infected lungs. Scalebar indicates directional log10 (P) value. Prominent clusters of transcripts with shorter distances (red) in SARS-CoV-2-infected lungs were assessed by pathway analysis, identifying that these likely represent widespread interferon γ signalling (dark green box) and ECM deposition (pink box), along with pathogen phagocytosis (light green box), ‘SARS-CoV-2 infection’ and an extensive list of predominantly signalling pathways (blue box) and, although not statistically significant, what likely represents the perturbation of immune targeting of self (black box). (C) Expanded view of dendrogram section in the purple boxed region in (B), showing that transcripts cluster by functionality. Position of FKBP11 clustering with both interferon α and β receptor subunit 2 (IFNAR2) and interferon γ receptor 2 (IFNGR2).
To assess the propensity of the approach to report differences unrelated to sample group definitions, samples were randomly assigned into 2 groups of 16, each with 8 normal and 8 SARS-CoV-2 samples, and the number of transcript type-to-transcript type mean SM3DDs that passed FDR-control was determined. Across 1000 reiterations of randomising samples, 3 reiterations reported that only 1 of 960 400 mean SM3DDs comparisons passed FDR-control, while the other 997 reiterations reported none.
To better visualize the results of the mean SM3DD comparison between SARS-CoV-2 and normal (Extended Data Table 1), we displayed the directional log10(P values) as a clustered heatmap, with ‘from’ and ‘to’ transcript type on alternate axes (Supplementary Fig. S2). The extent to which transcripts with shared function clustered together with very small cophenic distances was striking and most obvious for gene family members (Fig. 2C). Notably, FKBP11 clustered with both interferon α and β receptor subunit 2 (IFNAR2) and interferon γ receptor 2 (IFNGR2), while MZT2A clustered with the long noncoding RNA Metastasis Associated Lung Adenocarcinoma Transcript 1 (MALAT1), which is nuclear localised.
For prominent clusters of statistically shorter mean SM3DDs in SARS-CoV-2 samples, sets of clustered genes were assessed by Fisher’s Exact Test with FDR control for pathway-level effects, utilising GeneCard Suite’s PathCard database [12]. There are two bands where the mean SM3DDs are predominantly shorter for clusters of transcripts from which the SM3DDs were determined (FROM-Transcripts), indicating widespread effects. Pathway analysis identified that these likely represent ‘interferon γ signalling’ and ECM deposition (SuperPathways ‘miRNA targets in ECM and membrane receptors’, ‘ECM proteoglycans’, and ‘extracellular matrix organization’) (Fig. 2B, Extended Data Tables 2–3). Assessment of transcript density for the five main induced collagen types confirmed extensive ECM deposition by pan-cytokeratin negative cells (Fig. 3A), while an assessment of the three interferon-induced transcripts in the assay confirmed widespread signalling (Fig. 3B).
Transcript density across SARS-CoV-2-infected lung. Tessellation (white) of transcript density distribution for (A) COL1A1/COL1A2/COL3A1/COL6A2/COL6A3 showing ECM deposition by pan-cytokeratin negative cells, (B) IFI27/IFITM1/IFITM3 showing widespread interferon signalling, and (C) CD68/CD163/FCER1G/FCGR3A/GLUL/HLA-DPB1/HLA-DRB5/ITGB2/SPP1 showing a high density of cells involved in pathogen phagocytosis. Pseudocoloring of pan-cytokeratin staining (red) and DAPI nuclear staining (blue) is shown.
For one region of the clustered heatmap with statistically shorter mean SM3DDs in SARS-CoV-2 samples, pathway analysis reported an increase in microglia pathogen phagocytosis (‘phosphorylation of CD3 and TCR zeta chains’, ‘microglia pathogen phagocytosis pathway’ and ‘TCR signalling (REACTOME)’ for the cluster of FROM-Transcripts, and for the cluster of TO-Transcripts, ‘microglia pathogen phagocytosis pathway’ and ‘TYROBP causal network in microglia’). This result is limited, though, by the absence of any markers in the assay capable of distinguishing microglia from macrophages (Fig. 2B, Extended Data Tables 4–5). Transcript density mapping for nine of the clusters of TO-Transcripts shows a high density of expressing cells, consistent with macrophage infiltration (Fig. 3C).
For one large cluster of TO-Transcripts, 107 SuperPathways passed FDR control. Notably, the third statistically strongest after ‘cellular responses to stimuli’ and ‘infectious disease’ was ‘SARS-CoV-2 infection’ (P = 5.44 × 10^−8^). Largely, the list of SuperPathways for this cluster likely represents a conglomerate of overlapping and interacting signalling pathways active or affected in SARS-CoV-2 infection (Fig. 2B, Extended Data Table 6).
For both another large cluster of TO-Transcripts and the cluster of FROM-Transcripts that specifically have shorter SM3DDs to them, no SuperPathways passed FDR control. Interestingly, though, the cophenic distances across this cluster of FROM-Transcripts are notably shorter, and functional similarities between several of the SuperPathways identified without FDR control suggest that this region of the heatmap may represent the perturbation of immune targeting of self (‘perturbations to host-cell autophagy, induced by SARS-CoV-2 proteins’ for FROM-Transcripts and ‘cancer immunotherapy by CTLA4 blockade’ / ‘FOXP3 in COVID-19′ for TO-Transcripts) (Fig. 2B, Extended Data Tables 7–8).
Segmented PCA facilitates pathway identification
As any given transcript type may exist in multiple contexts within a tissue sample, we also implemented a PCA approach, segmenting the analysis of all SM3DDs according to the transcript type from which they were determined. For each of the 980 transcript-specific segments, the difference in means of the PCA scores between transcripts in SARS-CoV-2 and normal samples was determined within each PCA dimension. These transcript-specific differences were then grouped by PCA dimension and assessed for pathway-level effects with FDR control. This approach identified SuperPathways for the first four PCA dimensions, but not the six subsequent dimensions, as being affected by SARS-CoV-2 infection. This unsupervised approach identified SuperPathways self-evidently affected by SARS-CoV-2 infection, such as ‘infectious disease’, ‘innate immune system’, ‘interferon gamma signalling’, and ‘SARS-CoV-2 infection’, along with several related to ECM deposition, muscle contraction, fatty acid biosynthesis, nerve growth cone collapse, and O_2_/CO_2_ exchange in red blood cells (Fig. 4). Mapping of muscle-specific transcripts to FOV images identified that ‘muscle contraction’ is likely a batch effect, probably caused by differences in sampling locations between the two groups (data not shown).
SARS-CoV-2-related SuperPathways identified by segmented PCA. PathCard SuperPathways identified for the first four PCA dimensions. PCA of all SM3DDs was segmented according to the transcript type from which they were determined. Differences in mean PCA scores between transcripts in SARS-CoV-2 and normal samples were determined within each PCA dimension. These transcript-specific differences were then grouped by PCA dimension and assessed for pathway-level effects with FDR control.
Fibrotic Lung
To assess the versatility of SM3DD, we applied it to an intra-sample comparison of a lower-plex dataset with less divergent sample groups, choosing a publicly-available Xenium 343-plex spatial RNA dataset that included matched ‘more’ and ‘less’ fibrotic regions from 8 patients with pulmonary fibrosis [8]. Clustering the results of a pair-wise mean SM3DD comparison (Extended Data Table 9) also lead to multiple instances of functionally-related transcripts clustering together despite having larger cophenic distances, such as AKR1C1 and AKR1C2; CD3D, CD3E and CD3G; CD8A and CD8B; HLA-DQA1 and HLA-DQB1; IFIT2 and IFIT3; IL1A and IL1B; KRT8 and KRT18; KRT15 and KRT17; and a proliferation-specific cluster CDK1, CCNB2, CENPF, TOP2A and MKI67. Pathway analysis of both the FROM- and TO-transcript clusters for the single prominent region of the heatmap with statistically shorter mean SM3DDs in ‘more’ fibrotic regions identified it as representing an unfolded protein response (Fig. 5, Supplementary Fig. S3, Extended Data Tables 10–11).
Comparison of transcript distances between ‘more’ and ‘less’ pulmonary fibrosis. (A) Volcano plot of all directional transcript distance comparisons. The x-axis is the distance ratio, where larger values represent shorter distances in regions that are more fibrotic. The y-axes are negative log10 (P values). (B) Clustering of directional log10 (P values) from comparison of transcript distances between ‘more’ and ‘less’ pulmonary fibrosis. Scalebar indicates directional log10 (P) value. The prominent cluster of transcripts with shorter distances (red) in more fibrotic regions was assessed by pathway analysis, identifying an unfolded protein response (black box).
Pathway analysis of transcript-specific differences in scores from the segmented PCA identified SuperPathways for the first two dimensions (Fig. 6). Four SuperPathways were identified for the first dimension, ‘defective CSF2RA causes {pulmonary surfactant metabolism dysfunction 4} SMDP4’ and the related ‘surfactant metabolism’, along with ‘FOXA1 transcription factor network’ and ‘prostaglandin synthesis and regulation’. Of the three SuperPathways identified for the second dimension, the strongest statistically was ‘triglyceride metabolism’ (P = 1.66 × 10^−14^), with the others both being related to it, ‘familial partial lipodystrophy’ and the ‘PPAR signalling pathway’.
Pulmonary fibrosis-related SuperPathways identified by segmented PCA. PathCard SuperPathways identified for the first two PCA dimensions. PCA of all SM3DDs was segmented according to the transcript type from which they were determined. Differences in mean PCA scores between transcripts in matched ‘more’ and ‘less’ fibrotic regions from eight patients with pulmonary fibrosis were determined within each PCA dimension. These transcript-specific differences were then grouped by PCA dimension and assessed for pathway-level effects with FDR control.
Discussion
Approximately 45% of all transcript type-to-transcript type mean SM3DDs differed between SARS-CoV-2 and normal samples. This relatively large proportion is likely a reflection of the extent of immune infiltration and innate and adaptive immunity signalling along with changes due to the chronic infection itself. A decrease in the mean SM3DD from FKBP11 to MZT2A transcripts was the most statistically significant difference between SARS-CoV-2 and normal samples. Clustering of directional log10(P) values from mean SM3DDs comparisons grouped genes by functionality. FKBP11 clustered with interferon receptor subunits, implicating a potential role in SARS-CoV-2 infection for FKBP11 in either the facilitation or regulation of interferon signalling, which is consistent with previous findings in osteoblasts, where involvement of an FKBP11-CD81-FPRP complex in the expression of interferon-induced genes was reported [13]. While not conclusive, a markedly shorter mean SM3DD from nuclear localised NEAT1-to-FKBP11, along with clustering of MZT2A with another nuclear localised lncRNA, MALAT1, alludes to the possibility that SM3DD may be detecting nuclear localised of FKBP11 mRNA during SARS-CoV-2 infection.
Pathway analysis of clusters coinciding with prominent regions of the heatmap exhibiting statistically shorter mean SM3DDs identified multiple biological processes affected by SARS-CoV-2 infection, with a high degree of consistency with previous studies. Widespread interferon γ signalling was identified by both the mean SM3DD and segmented PCA analyses and is known to be particularly associated with SARS-CoV-2-induced ARDS and fatal outcomes (reviewed in [14]). Widespread ECM deposition was also identified by both analyses, consistent with other studies of autopsy samples from SARS-CoV-2 induced ARDS patients that reported extensive ECM remodelling with a prominent increase in collagen deposition [15].
While ‘microglia pathogen phagocytosis’ was identified, the assay does not include any microglia-specific markers, so this result may instead represent macrophage phagocytosis. Previous studies have reported the accumulation of inflammatory, profibrotic macrophages in the lung during SARS-CoV-2 infection (reviewed in [14]) and in post-acute SARS-CoV-2 patients with persistent respiratory symptoms, the abundance of alveolar macrophages correlated with the severity of fibrosis, indicative of failed lung repair [16].
Although not passing FDR control, a distinct region of the clustered results of the mean SM3DD comparison resulted in functional similarities between several SuperPathways related to the perturbation of immune targeting of self, which included SARS-CoV-2-induced perturbation of autophagy and FOXP3 in SARS-CoV-2 infection. Multiple SARS-CoV-2 proteins are known to perturb host innate and adaptive immune systems (reviewed in [17]). SARS-CoV-2 ORF3 specifically dysregulates autophagy [18] and pharmacological induction of autophagy reduced SARS-CoV-2 propagation in both primary human lung cells and intestinal organoids [19], while SARS-CoV-2′s S-protein dysregulates FOXP3-induced regulatory T cell development (reviewed in [20]). The inclusion of ‘cancer immunotherapy by CTLA4 blockade’ in the associated SuperPathways alludes to possible overlap between viral and cancer cell-induced immune perturbation.
While it has previously been reported that SARS-CoV-2 can infect and spread through the peripheral nerve system [21], this is the first report indicating that nerve growth cone collapse may be occurring in SARS-CoV-2-affected lung autopsy tissue. As neuropilin-1 can facilitate SARS-CoV-2 cell entry and infectivity [22] and has been shown to mediate semaphorin D/III-induced growth cone collapse [23], we speculate that nerve growth cone collapse may involve viral protein interaction with neuropilin-1.
Pathway-level analysis indicated that ‘O_2_/CO_2_ exchange in red blood cells’ was affected by SARS-CoV-2 infection. While SM3DD does compute distances from hemoglobin transcripts in erythrocytes, the underlying differences found here are more likely to have been driven by changes in alveolar epithelial cells, which can be derived from hematopoietic stem cells and have been shown to express haemoglobin [24].
Pathway analysis of scores from the segmented PCA revealed fatty acid biosynthesis as being affected by SARS-CoV-2 infection. Lipids have been shown to accumulate in lungs and cells infected with SARS-CoV-2, and plasmid lipid pattern alterations correlate with disease progression and severity (reviewed in [25]). Pharmacological inhibition of fatty acid synthesis for the treatment of SARS-CoV-2 infection was proposed following results where it blocked SARS-CoV-2 replication and improved survival in a mouse model [26].
The importance of surfactant homeostasis dysfunction in pulmonary fibrosis is well established [27], leading to the accumulation of floccular material in the alveoli and activation of an unfolded protein response [28], the latter of which was also identified here through pathway-level analysis of the clustered mean SM3DD results. Pulmonary surfactant is a lipid/protein mixture that reduces surface tension at the air-liquid interface and is recycled by alveolar macrophages following granulocyte-macrophage colony-stimulating factor receptor (CSF2R) signalling [29]. Pathway-level analysis of segmented PCA score differences between matched ‘more’ and ‘less’ pulmonary fibrotic regions identified a change in the extent of surfactant metabolism dysfunction caused by defective CSF2R signalling, which can be due to neutralizing autoantibodies to CSF2 [30] or hereditary mutations in either subunit [31].
Pulmonary surfactant is ∼90% lipid by weight [29]. The segmented PCA approach described here identified triglyceride metabolism as being differentially affected between matched ‘more’ and ‘less’ pulmonary fibrotic regions, along with two well-described regulators of lipid metabolism, FOXA1 and PPAR signalling [32, 33]. This is consistent with recent studies that have shown a strong association between lipid metabolism and both the onset and progression of pulmonary fibrosis, through the promotion of apoptosis and the induction of both pro-fibrotic biomarkers and endoplasmic reticulum stress [34], the latter of which can lead to an unfolded protein response.
Prostaglandin synthesis and regulation were also identified as being differentially affected between matched ‘more’ and ‘less’ pulmonary fibrotic regions. This is concordant with previous studies that have reported on the role of prostaglandins in limiting pathological features of lung fibrosis, such as TGFβ-induced myofibroblast differentiation, along with lung fibroblast migration, proliferation, and collagen deposition [35].
At a cellular level, the data provided by spatial transcriptomics is low density. SM3DD overcomes the limitations inherent in low-density data by leveraging extreme numbers of transcript-to-transcript distances. This SARS-CoV-2 dataset has previously been analysed using cell segmentation approaches, reporting that M1 macrophages have elevated expression of IFI27 in areas of SARS-CoV-2 infection [36]. The breadth of findings reported here demonstrates that applying SM3DD contrasts favorably to using cell segmentation alone approaches. Pathway analysis of SM3DD’s output identified ‘SARS-CoV-2 infection’ without the assay including any SARS-CoV-2 transcripts, underscoring the power of the approach. We have shown that a segmented PCA approach has sufficient similarity across the transcript-specific segments for the first few PCA dimensions to facilitate pathway-level analysis. The overall approach can assess changes in cell types typically missed by cell segmentation methods that rely on nuclear detection, such as erythrocytes and peripheral nerves, demonstrating here that SM3DD can lead to descriptions of peripheral nerve cell pathology without requiring either nerve cell segmentation or annotation, highlighting SM3DD’s utility in spatial transcriptomics. The signalling pathway and metabolic changes determined here from SM3DD analysis of the pulmonary fibrosis dataset were not reported in the dataset’s original publication [8], underscoring the value of including SM3DD as a complementary analysis. The power of SM3DD to discover new biology, from either inter- or intra-group comparisons, will likely grow with the advent of whole-transcriptome spatial assays [37].
Supplementary Material
lqag007_Supplemental_Files
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Tzotzos SJ, Fischer B, Fischer H. et al. Incidence of ARDS and outcomes in hospitalized patients with COVID-19: a global literature survey. Crit Care. 2020;24:516. 10.1186/s 13054-020-03240-7.32825837 PMC 7441837 · doi ↗ · pubmed ↗
- 2Bösmüller H, Matter M, Fend F. et al. The pulmonary pathology of COVID-19. Virchows Arch. 2021;478:137–50.33604758 10.1007/s 00428-021-03053-1PMC 7892326 · doi ↗ · pubmed ↗
- 3Nature . Method of the Year 2024: spatial proteomics. Nat Methods. 2024;21:2195–6. 10.1038/s 41592-024-02565-3.39643689 · doi ↗ · pubmed ↗
- 4Marx V. Method of the Year: spatially resolved transcriptomics. Nat Methods. 2021;18:9–14. 10.1038/s 41592-020-01033-y.33408395 · doi ↗ · pubmed ↗
- 5Park J, Choi W, Tiesmeyer S. et al. Cell segmentation-free inference of cell types from in situ transcriptomics data. Nat Commun. 2021;12:3545. 10.1038/s 41467-021-23807-4.34112806 PMC 8192952 · doi ↗ · pubmed ↗
- 6Si Y, Lee C, Hwang Y. et al. FICTURE: scalable segmentation-free analysis of submicron-resolution spatial transcriptomics.Nat Methods. 2024;21:1843–54. 10.1038/s 41592-024-02415-2.39266749 PMC 11466694 · doi ↗ · pubmed ↗
- 7Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. Adv Neural Inform Process Syst. 2017;30.
- 8Vannan A, Lyu R, Williams AL. et al. Spatial transcriptomics identifies molecular niche dysregulation associated with distal lung remodeling in pulmonary fibrosis. Nat Genet. 2025;57:647–58. 10.1038/s 41588-025-02080-x.39901013 PMC 11906353 · doi ↗ · pubmed ↗
