When Genes Speak: A Semantic-Guided Framework for Spatially Resolved Transcriptomics Data Clustering
Jiangkai Long, Yanran Zhu, Chang Tang, Kun Sun, Yuanyuan Liu, Xuesong Yan

TL;DR
SemST is a novel deep learning framework that combines semantic information from gene symbols with spatial data to improve clustering accuracy in spatial transcriptomics, utilizing LLMs and GNNs.
Contribution
Introduces SemST, integrating biological semantics via LLMs with spatial relationships through GNNs, and proposes the FSM module for enhanced biological prior utilization.
Findings
Achieves state-of-the-art clustering performance on public datasets.
FSM module improves baseline methods when integrated.
Semantic-guided embeddings enhance biological interpretability.
Abstract
Spatial transcriptomics enables gene expression profiling with spatial context, offering unprecedented insights into the tissue microenvironment. However, most computational models treat genes as isolated numerical features, ignoring the rich biological semantics encoded in their symbols. This prevents a truly deep understanding of critical biological characteristics. To overcome this limitation, we present SemST, a semantic-guided deep learning framework for spatial transcriptomics data clustering. SemST leverages Large Language Models (LLMs) to enable genes to "speak" through their symbolic meanings, transforming gene sets within each tissue spot into biologically informed embeddings. These embeddings are then fused with the spatial neighborhood relationships captured by Graph Neural Networks (GNNs), achieving a coherent integration of biological function and spatial structure. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Bioinformatics and Genomic Networks · Gene expression and cancer classification
