# Topological Data Analysis for Unsupervised Feature Selection in Large Scale Spatial Omics Data Sets

**Authors:** James Boyle, Gregory Hamm, Eleanor Williams, Robin JG Hartman, Magnus Söderburg, Ian Henry, Michael Casey

PMC · DOI: 10.1007/s11538-026-01618-2 · Bulletin of Mathematical Biology · 2026-03-04

## TL;DR

This paper uses topological data analysis to identify spatially variable genes in large spatial omics datasets, offering a new approach for analyzing gene expression patterns in kidney disease and myocardial infarction.

## Contribution

The paper introduces a novel method using persistent homology for unsupervised feature selection in spatial omics data.

## Key findings

- Persistent homology provides a continuous quantification of spatial structure in gene expression.
- The method reveals biologically meaningful insights in kidney disease and myocardial infarction.
- The approach extends naturally to other spatial omics modalities like spatial metabolomics.

## Abstract

Spatial transcriptomics studies are becoming increasingly large and commonplace, necessitating simultaneous analysis of a large number of spatially resolved variables. Correspondingly, a diverse range of methodologies have been proposed to compare the spatial expression structure of genes. Here, we apply persistent homology, a method from topological data analysis, to produce a continuous quantification of spatial structure in a given gene’s expression, and show how this can be used for downstream tasks such as spatially variable gene identification. We explore the unique advantages of topology for this task, deriving biologically meaningful insights into kidney disease and myocardial infarction using public spatial transcriptomics data. We also show how the non-parametric nature of homology enables our methodology to extend naturally to other spatial omics modalities, demonstrating this on a spatial metabolomics sample. Our work showcases the advantages of using a continuous quantification of spatial structure over p-value based approaches to SVG identification, the potential for developing unified methods for the analysis of different spatial omics modalities, and the utility of persistent homology in big data applications.

The online version contains supplementary material available at 10.1007/s11538-026-01618-2.

## Linked entities

- **Diseases:** kidney disease (MONDO:0001343), myocardial infarction (MONDO:0005068)

## Full-text entities

- **Genes:** PTGDS (prostaglandin D2 synthase) [NCBI Gene 5730] {aka L-PGDS, LPGDS, PDS, PGD2, PGDS, PGDS2}, NOC2L (NOC2 like nucleolar associated transcriptional repressor) [NCBI Gene 26155] {aka NET15, NET7, NIR, PPP1R112}, PODXL (podocalyxin like) [NCBI Gene 5420] {aka Gp200, PC, PCLP, PCLP-1, PDX, PODXL1}, TNNT2 (troponin T2, cardiac type) [NCBI Gene 7139] {aka CMD1D, CMH2, CMPD2, LVNC6, RCM3, TnTC}, HTRA1 (HtrA serine peptidase 1) [NCBI Gene 5654] {aka ARMD7, CADASIL2, CARASIL, CARASIL2, HtrA, L56}, TPM1 (tropomyosin 1) [NCBI Gene 7168] {aka C15orf13, CMD1Y, CMH3, HEL-S-265, HTM-alpha, LVNC9}, TGFBR2 (transforming growth factor beta receptor 2) [NCBI Gene 7048] {aka AAT3, FAA3, LDS1B, LDS2, LDS2B, MFS2}, Sparc (secreted acidic cysteine rich glycoprotein) [NCBI Gene 20692] {aka BM-40, ON}, Col1a2 (collagen, type I, alpha 2) [NCBI Gene 12843] {aka Col1a-2, Cola-2, Cola2, oim}, MYL3 (myosin light chain 3) [NCBI Gene 4634] {aka CMH8, MLC-lV/sb, MLC1SB, MLC1V, VLC1, VLCl}, MYL2 (myosin light chain 2) [NCBI Gene 4633] {aka CMH10, MFM12, MLC-2, MLC-2s/v, MLC-2v, MLC2}, COX7B (cytochrome c oxidase subunit 7B) [NCBI Gene 1349] {aka APLCC, LSDMCA2}, UMOD (uromodulin) [NCBI Gene 7369] {aka ADMCKD2, ADTKD1, FJHN, HNFJ, HNFJ1, MCKD2}, CKM (creatine kinase, M-type) [NCBI Gene 1158] {aka CKMM, CPK-M, M-CK}, TNNC1 (troponin C1, slow skeletal and cardiac type) [NCBI Gene 7134] {aka CMD1Z, CMH13, TN-C, TNC, TNNC}, RNF207 (ring finger protein 207) [NCBI Gene 388591] {aka C1orf188}, Col1a1 (collagen, type I, alpha 1) [NCBI Gene 12842] {aka Col1a-1, Cola-1, Cola1, Mov-13, Mov13}, Fn1 (fibronectin 1) [NCBI Gene 14268] {aka E330027I09, Fn, Fn-1}, IFI27 (interferon alpha inducible protein 27) [NCBI Gene 3429] {aka FAM14D, ISG12, ISG12A, P27}, IGFBP5 (insulin like growth factor binding protein 5) [NCBI Gene 3488] {aka IBP5}, ACTA1 (actin alpha 1, skeletal muscle) [NCBI Gene 58] {aka ACTA, ASMA, CFTD, CFTD1, CFTDM, CMYO2A}, MYH7 (myosin heavy chain 7) [NCBI Gene 4625] {aka CMD1S, CMH1, CMYO7A, CMYO7B, CMYP7A, CMYP7B}, Col3a1 (collagen, type III, alpha 1) [NCBI Gene 12825] {aka Col3a-1, Tsk-2, Tsk2}, ACTC1 (actin alpha cardiac muscle 1) [NCBI Gene 70] {aka ACTC, ASD5, CMD1R, CMH11, LVNC4}
- **Diseases:** tubular atrophy (MESH:D001284), CKD (MESH:D051436), Myocardial Infarction (MESH:D009203), cancerous (MESH:D009369), SV (MESH:D002303), myocardial infraction (MESH:C535636), ischaemic (MESH:D018917), Cardiac Fibrosis (MESH:D005355), chronic diseases (MESH:D002908), heart disease (MESH:D006331), AKI (MESH:D058186), Kidney Disease (MESH:D007674), infarcted (MESH:D007238)
- **Chemicals:** Cer (-), H&amp;E (MESH:D006371), chloride (MESH:D002712), ceramide (MESH:D002518), sphingolipids (MESH:D013107)
- **Species:** Homo sapiens (human, species) [taxon 9606], Rattus norvegicus (brown rat, species) [taxon 10116]
- **Cell lines:** AKK003-157775 — Homo sapiens (Human), Melanoma, Cancer cell line (CVCL_B4KF)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12960454/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12960454/full.md

## References

2 references — full list in the complete paper: https://tomesphere.com/paper/PMC12960454/full.md

---
Source: https://tomesphere.com/paper/PMC12960454