# Adding highly variable genes to spatially variable genes can improve cell type clustering performance in spatial transcriptomics data

**Authors:** Yijun Li, Stefan Stanojevic, Bing He, Zheng Jing, Qianhui Huang, Jian Kang, Lana X Garmire

PMC · DOI: 10.1093/bioadv/vbaf285 · Bioinformatics Advances · 2025-11-20

## TL;DR

Adding genes with high variability to spatially variable genes improves cell type clustering in spatial transcriptomics data.

## Contribution

This study shows that combining highly variable and spatially variable genes enhances clustering performance in spatial transcriptomics.

## Key findings

- Combining highly variable and spatially variable genes improves cell-type clustering performance.
- Results were validated across over 50 real datasets from multiple spatial transcriptomics platforms.
- Clustering metrics showed consistent improvement when both gene sets were used together.

## Abstract

Spatial transcriptomics has allowed researchers to analyze transcriptome data in its tissue sample’s spatial context. Various methods have been developed for detecting spatially variable genes (SV genes), whose gene expression over the tissue space shows strong spatial autocorrelation. Such genes are often used to define clusters in cells or spots downstream. However, highly variable (HV) genes, whose quantitative gene expressions show significant variation from cell to cell, are conventionally used in clustering analyses.

In this report, we investigate whether adding highly variable genes to spatially variable genes can improve the cell type clustering performance in spatial transcriptomics data. We tested the clustering performance of HV genes, SV genes, and the union of both gene sets (concatenation) on over 50 real spatial transcriptomics datasets across multiple platforms, using a variety of spatial and non-spatial metrics. Our results show that combining HV genes and SV genes can improve overall cell-type clustering performance.

All data and code used in this evaluation study can be found in the following link: https://github.com/lanagarmire/ST_benchmark.

## Full-text entities

- **Genes:** ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, CALCA (calcitonin related polypeptide alpha) [NCBI Gene 796] {aka CALC1, CGRP, CGRP-I, CGRP-alpha, CGRP1, CT}
- **Diseases:** Ovarian Cancer (MESH:D010051), cancer (MESH:D009369), NSCLC (MESH:D002289), Breast Cancer (MESH:D001943), AMI (MESH:D000275), SV (MESH:D008569)
- **Chemicals:** Visium (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]
- **Cell lines:** Monocle3 — Mus musculus (Mouse), Hybridoma (CVCL_C6V6), ENDO — Homo sapiens (Human), Transformed cell line (CVCL_B5WL), MES — Rattus norvegicus (Rat), Transformed cell line (CVCL_0506)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12809558/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12809558/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12809558/full.md

---
Source: https://tomesphere.com/paper/PMC12809558