SpaLLM: a general framework for spatial domain identification with large language models
Zeyu Zou, Ziheng Duan

TL;DR
SpaLLM improves spatial domain identification in tissues by combining gene expression data with biological knowledge from gene descriptions using large language models.
Contribution
SpaLLM introduces a novel framework that integrates LLM-derived gene functional knowledge with spatial transcriptomics data for better spatial domain identification.
Findings
SpaLLM enhances spatial domain identification when tested on Visium and osmFISH datasets.
The framework improves performance by combining gene expression with biologically informed gene representations.
SpaLLM is modular and compatible with existing spatial analysis pipelines.
Abstract
Spatial transcriptomics (ST) technologies enable the profiling of gene expression while preserving spatial context, offering unprecedented insights into tissue organization. However, traditional spatial domain identification methods primarily rely on gene expression matrices and spatial coordinates while overlooking the rich biological knowledge encoded in gene functional descriptions. Here, we propose SpaLLM, a general framework that integrates large language model (LLM) embeddings of gene descriptions with conventional spatial transcriptomics analysis. Our approach leverages pre-computed GenePT embeddings from NCBI gene summaries to create biologically-informed gene representations. SpaLLM combines these LLM-derived gene features with cell-gene expression matrices through matrix multiplication, generating enriched cell representations that capture both expression patterns and…
Click any figure to enlarge with its caption.
FIGURE 1| Slice ID | Dimensions (cells | Original density (%) | Quality level density (%) | |||
|---|---|---|---|---|---|---|
| Q1 | Q2 | Q3 | Q4 | |||
| 151507 | (4221, 3000) | 12.75 | 6.38 | 3.19 | 1.60 | 0.80 |
| 151508 | (4381, 3000) | 11.96 | 5.98 | 2.99 | 1.50 | 0.75 |
| 151509 | (4788, 3000) | 12.92 | 6.46 | 3.23 | 1.62 | 0.81 |
| 151510 | (4595, 3000) | 12.88 | 6.44 | 3.22 | 1.61 | 0.81 |
| 151669 | (3636, 3000) | 12.71 | 6.36 | 3.18 | 1.59 | 0.80 |
| 151670 | (3484, 3000) | 12.72 | 6.36 | 3.18 | 1.59 | 0.80 |
| 151671 | (4093, 3000) | 13.70 | 6.85 | 3.43 | 1.72 | 0.86 |
| 151672 | (3888, 3000) | 13.41 | 6.70 | 3.35 | 1.68 | 0.84 |
| 151673 | (3611, 3000) | 15.18 | 7.59 | 3.80 | 1.90 | 0.95 |
| 151674 | (3635, 3000) | 16.81 | 8.41 | 4.20 | 2.10 | 1.05 |
| 151675 | (3566, 3000) | 13.60 | 6.80 | 3.40 | 1.70 | 0.85 |
| 151676 | (3431, 3000) | 14.36 | 7.18 | 3.59 | 1.80 | 0.90 |
| Method | Quality | Donor 1 | |||
|---|---|---|---|---|---|
| 151507 | 151508 | 151509 | 151510 | ||
| SpaceFlow | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| SpaceFlow + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| STAGATE | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| STAGATE + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| GraphST | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| GraphST + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| stCluster | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| stCluster + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| Method | Quality | Donor 2 | |||
|---|---|---|---|---|---|
| 151669 | 151670 | 151671 | 151672 | ||
| SpaceFlow | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| SpaceFlow + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| STAGATE | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| STAGATE+SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| GraphST | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| GraphST + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| stCluster | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| stCluster + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| Method | Quality | Donor 3 | |||
|---|---|---|---|---|---|
| 151673 | 151674 | 151675 | 151676 | ||
| SpaceFlow | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| SpaceFlow + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| STAGATE | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| STAGATE + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| GraphST | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| GraphST + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| stCluster | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| stCluster + SpaLLM | Q0 |
|
|
|
|
| Q1 |
|
|
|
| |
| Q2 |
|
|
|
| |
| Q3 |
|
|
|
| |
| Q4 |
|
|
|
| |
| Dataset | Strategy/Model | Q0 | Q1 | Q2 | Q3 | Q4 |
|---|---|---|---|---|---|---|
| DLPFC | Expression only | 0.55 | 0.53 | 0.22 | 0.12 | 0.02 |
| Functional only | 0.54 | 0.53 | 0.24 | 0.13 | 0.02 | |
| Concatenation | 0.56 | 0.54 | 0.24 | 0.14 | 0.04 | |
| Weighted (ada) |
|
|
| 0.16 | 0.06 | |
| Weighted (large) |
| 0.55 | 0.25 |
|
| |
| osmFISH | Expression only | 0.397 | 0.372 | 0.305 | 0.182 | 0.085 |
| Functional only | 0.402 | 0.374 | 0.331 | 0.205 | 0.132 | |
| Concatenation | 0.411 | 0.389 | 0.357 | 0.215 | 0.142 | |
| Weighted (ada) | 0.420 | 0.401 | 0.358 | 0.238 | 0.168 | |
| Weighted (large) |
|
|
|
|
|
| Method | Q0 | Q1 | Q2 | Q3 | Q4 |
|---|---|---|---|---|---|
| SpaceFlow |
|
|
|
|
|
| SpaceFlow + SpaLLM |
|
|
|
|
|
| STAGATE |
|
|
|
|
|
| STAGATE + SpaLLM |
|
|
|
|
|
| GraphST |
|
|
|
|
|
| GraphST + SpaLLM |
|
|
|
|
|
| stCluster |
|
|
|
|
|
| stCluster + SpaLLM |
|
|
|
|
|
| Method | Subregion (1,000 cells) | Full tissue (4,839 cells) |
|---|---|---|
| SpaceFlow + SpaLLM (vs. base) |
|
|
| STAGATE + SpaLLM (vs. base) |
|
|
| GraphST + SpaLLM (vs. base) |
|
|
| stCluster + SpaLLM (vs. base) |
|
|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Bioinformatics and Genomic Networks
Introduction
1
Spatial transcriptomics (ST) technologies have revolutionized our understanding of tissue architecture by enabling simultaneous measurement of gene expression and spatial location information (Rodriques et al., 2019; Emani et al., 2024; Ruzicka et al., 2024). A fundamental task in ST analysis is spatial domain identification, which aims to partition tissue sections into distinct regions based on similar gene expression patterns and spatial proximity. These spatial domains often correspond to anatomical structures, functional units, or pathological states, making their accurate identification crucial for understanding tissue organization and disease mechanisms (Chen et al., 2022).
Current spatial domain identification methods predominantly follow a graph-based approach, where spots or cells are represented as nodes in a spatial graph, and edges encode spatial proximity relationships (Hu et al., 2021; Zhao et al., 2021; Duan et al., 2024a; Duan et al., 2024b; Duan et al., 2025b). These methods typically employ graph neural networks (GNNs) or graph autoencoders to learn latent representations from cell-by-gene expression matrices and spatial coordinates, followed by clustering algorithms to identify spatial domains (Dong and Zhang, 2022; Long et al., 2023). While effective, these approaches have a fundamental limitation: they treat genes merely as numerical features without leveraging the extensive biological knowledge accumulated about gene functions, pathways, and interactions.
Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in understanding biological text. The GenePT framework has shown that LLM embeddings of gene descriptions from NCBI can effectively capture biological relationships and improve downstream tasks in single-cell analysis (Chen and Zou, 2024). Specifically, GenePT uses pre-computed OpenAI text embeddings on NCBI gene summaries, demonstrating that these embeddings often outperform expression-based methods for various biological tasks.
Motivated by these observations, we propose SpaLLM, as shown in Figure 1, a general framework that integrates LLM-derived gene functional features with traditional spatial transcriptomics analysis. Our key insight is that gene functional descriptions contain rich biological knowledge that can inform spatial domain identification beyond what expression patterns alone can reveal. By leveraging pre-trained language models to encode gene descriptions from biological databases, we create biologically-informed gene representations that capture functional relationships, pathway memberships, and molecular mechanisms.
Overview of SpaLLM framework for integrating gene functional knowledge into spatial domain identification. Left: Input spatial transcriptomics data consists of tissue slices with observed gene expression matrix Xobs∈Rn×m and spatial coordinates C∈Rn×2 . Middle: SpaLLM operates through dual complementary pathways: (1) Traditional encoder-decoder architecture processes expression data Xobs and coordinates C through spatial graph neural networks to generate spatially-aware cell embeddings HST∈Rn×dST ; (2) Gene functional descriptions from NCBI database (exemplified by MBP gene summary) are encoded using pre-trained GPT embedding models to create gene feature matrix F∈Rm×dLLM , which is then multiplied with expression matrix to produce functional cell embeddings HLLM=XobsF . Right: The two embedding streams—capturing expression patterns with spatial context and gene functional characteristics, respectively—are integrated through weighted combination and fed into clustering algorithms for enhanced spatial domain identification, leveraging both quantitative expression data and qualitative biological knowledge.
The SpaLLM framework introduces a novel approach that enhances traditional spatial transcriptomics analysis by integrating gene functional knowledge. Following the standard encoder-decoder paradigm, we first obtain cell embeddings from ST data using existing graph-based methods. Simultaneously, we derive functional cell embeddings by multiplying the cell-by-gene expression matrix with LLM-derived gene embeddings , where the gene feature matrix encodes biological functions extracted from NCBI descriptions. We then combine these two complementary cell representations—one capturing expression patterns and spatial context, the other encoding functional characteristics—and feed the integrated features into clustering algorithms for improved spatial domain identification.
Our main contributions are as follows:
- We introduce the first systematic framework to integrate LLM embeddings of gene functional descriptions into spatial transcriptomics analysis;
- We demonstrate that incorporating biological knowledge through gene text descriptions significantly improves spatial domain identification accuracy, particularly for low-quality data;
- We provide a modular framework compatible with existing spatial analysis methods, enabling broad adoption across different research pipelines;
- We conduct comprehensive experiments across multiple datasets with varying quality levels, showing consistent 3.8%–400% improvements over state-of-the-art methods.
Methods
2
Problem formulation and SpaLLM architecture
2.1
We formulate the problem of spatial domain identification as a clustering task on a graph. Let be the raw gene expression matrix, where is the number of spatial spots and is the number of genes. The spatial coordinates of the spots are given by the matrix . The goal is to learn a mapping that transforms the input data into a latent feature space , where is the final embedding dimension. Subsequently, a clustering algorithm is applied to to identify the spatial domains, i.e., .
The SpaLLM framework enhances this process by incorporating biological knowledge from gene functional descriptions. We introduce a dual-stream encoding architecture that combines a conventional spatial encoder with a novel LLM-based functional encoder, as illustrated in a figure. The outputs of these two encoders are fused to produce the final enriched embeddings , which are then used for clustering.
Gene functional encoding
2.2
To capture the biological meaning of genes, we leverage pre-computed embeddings from a large language model trained on a corpus of gene functional descriptions (e.g., NCBI gene summaries). Let be the pre-computed gene feature matrix, where each row is the vector representation of the -th gene’s description. This feature matrix serves as a look-up table for the functional representation of each gene. The functional encoder transforms the gene expression data into a functional feature space. We define the functional spot embeddings as:
Here, is the functional encoder. This operation implicitly weighs the contribution of each functional dimension based on the expression level of corresponding genes in each spot.
Spatial encoder
2.3
The spatial encoder captures both gene expression patterns and spatial context. We construct a spatial graph where the set of vertices corresponds to the spots. The edge weights are defined by the adjacency matrix , where represents the spatial proximity between spot and spot . A common choice for computing is a Gaussian kernel on the spatial coordinates:
Here, and are hyperparameters.
The spatial encoder, , is a Graph Neural Network (GNN) that learns spatially-aware spot embeddings by propagating and aggregating features over the graph. The input to the GNN is the normalized gene expression matrix, :
The GNN layers are typically defined by a message passing scheme. For a multi-layer GNN, the update rule for the -th layer is:
where , is the normalized adjacency matrix, is a trainable weight matrix, and is a non-linear activation function.
Feature fusion and clustering
2.4
The embeddings from the two encoders, and , may have different dimensions . To ensure feature alignment before fusion, we first perform Principal Component Analysis (PCA) on the LLM embeddings to project them into the same dimension as the spatial embeddings. This yields the dimension-reduced functional embeddings, .
The final spot representation is then obtained by a weighted combination of the spatial and dimension-reduced functional embeddings:
The hyperparameters and control the contribution of each embedding stream. The combined embeddings capture both spatial proximity/expression patterns and biological functional knowledge.
Finally, the spatial domains are identified by applying a clustering algorithm, such as K-means, Louvain, or mclust, to the fused feature matrix .
Experimental setup
3
Datasets and data quality simulation
3.1
We evaluate SpaLLM on the human dorsolateral prefrontal cortex (DLPFC) spatial transcriptomics datasets from Maynard et al. (2021), which consist of 12 tissue sections with manually annotated spatial domains. To assess the robustness of our framework against varying data quality, we adopt a systematic simulation strategy where data quality is reduced by introducing sparsity Duan et al. (2025a). The original, unaltered data serves as a baseline for comparison. We create four simulated quality levels by randomly masking a percentage of the non-zero gene expression values: Q1 (50% masked), Q2 (75% masked), Q3 (87.5% masked), and Q4 (93.75% masked). This process generates a comprehensive testbed of 48 simulated datasets (12 original sections 4 quality levels), enabling a robust evaluation of SpaLLM’s performance under different conditions. The characteristics of these datasets are summarized in Table 1.
To validate cross-platform generalizability, we incorporated the osmFISH dataset (Codeluppi et al., 2018) of the mouse somatosensory cortex. This dataset utilizes imaging-based technology, which provides a higher spatial resolution but fewer genes compared to the sequencing-based Visium platform, offering a complementary modality for testing.
Implementation details
3.2
GenePT feature configuration
3.2.1
We use the pre-computed GenePT embeddings (Chen and Zou, 2024) with the following specifications:
- Embedding model: OpenAI text-embedding-ada-002.
- Feature dimension: .
- Gene coverage: 33,000+ genes with NCBI annotations.
Model hyperparameters
3.2.2
We adopt four representative baselines: SpaceFlow (Ren et al., 2022), STAGATE (Dong and Zhang, 2022), GraphST (Long et al., 2023), and stCluster (Wang et al., 2024). The hyperparameters for each method are set following the configurations recommended in their original papers. For SpaLLM integration, we set the weighting parameters and as default values to achieve balanced integration between spatial expression features and functional embeddings.
Evaluation metrics
3.3
We evaluate spatial domain identification using the Adjusted Rand Index (ARI), a robust metric for measuring the similarity between a predicted clustering and the ground truth. The ARI corrects for chance agreements and has a value of 1.0 for perfect clustering and 0 for random assignments. The formula for ARI is defined as:
Where is the Rand Index, is the expected Rand Index for a random partition, and is the maximum possible Rand Index.
Results
4
SpaLLM demonstrates consistent improvements across quality levels
4.1
We evaluated SpaLLM’s performance on spatial domain identification across 12 real-world and 48 simulated datasets with varying quality levels (Q0-Q4, where Q0 represents the original highest quality datasets and Q4 the lowest). Tables 2–4 present comprehensive results comparing four baseline methods (SpaceFlow, STAGATE, GraphST, and stCluster) with their SpaLLM-enhanced versions across three different donor samples. Results are averaged across ten runs.
The integration of SpaLLM with baseline methods shows consistent improvements across all quality levels and datasets. Notably, the performance gains become more pronounced as data quality decreases, highlighting SpaLLM’s robustness in challenging scenarios where traditional methods struggle.
Across all three donor samples, SpaceFlow integration with SpaLLM achieved modest but consistent improvements ranging from 3.8% to 9.4% in high-quality datasets (Q0, Q1) to more substantial gains of 11.1%–60.0% in degraded datasets (Q3, Q4). For example, in Donor 1 dataset 151509, SpaceFlow improved from 0.25 to 0.28 (12% gain) at Q3 level, while in dataset 151508, Q4 performance increased from 0.10 to 0.13 (30% gain).
STAGATE integration with SpaLLM demonstrated the most substantial improvements among all tested methods. In high-quality scenarios (Q0, Q1), improvements ranged from 5.3% to 9.4%, with notable examples including Donor 1 dataset 151507 improving from 0.55 to 0.58 (5.5% gain) at Q0. However, the most dramatic gains occurred in degraded data scenarios, with Q3 and Q4 improvements reaching 25%–400%. For instance, in the Donor 2 dataset 151670, Q4 performance surged from 0.01 to 0.05 (400% improvement), and in the Donor 1 dataset 151507, Q3 performance increased from 0.12 to 0.16 (33% gain).
GraphST showed significant benefits from SpaLLM integration, with improvements ranging from 6.2% to 13.8% in high-quality datasets to remarkable gains of up to 400% in the most challenging scenarios. In Donor 2, GraphST + SpaLLM achieved the highest overall performance in Q0 and Q1 levels, with values reaching 0.69 and 0.67, respectively, for dataset 151672. The most striking improvements were observed in Q4 scenarios, where performance increased from as low as 0.01 to 0.05 (400% improvement).
stCluster integration yielded improvements across all quality levels. High-quality datasets (Q0, Q1) showed improvements ranging from 4.7% to 8.6%, while degraded scenarios (Q3, Q4) demonstrated gains of 10.0%–50.0%. Notably, stCluster + SpaLLM achieved several best performances in Q2–Q4 categories, such as 0.44 (Q2) and 0.24 (Q3) in the Donor 1 dataset 151507.
Ablation studies: synergistic effects of LLM priors and integration strategies
4.2
To dissect the specific contributions of the individual components within the SpaLLM framework, we performed comprehensive ablation studies focusing on the feature integration strategy and the choice of the LLM embedding model. We utilized the STAGATE baseline on both the DLPFC (151507) and osmFISH datasets as representative cases.
As quantified in Table 5, we compared four distinct integration strategies: (1) Expression only (standard pipeline), (2) Functional only (LLM knowledge only), (3) Simple concatenation, and (4) Weighted fusion (SpaLLM). The results demonstrate a “synergistic effect.” While expression features are dominant in high-quality data (Q0), they suffer from a catastrophic performance collapse as sparsity increases; for example, the ARI on DLPFC drops from 0.55 (Q0) to a mere 0.02 (Q4). In contrast, the Functional only approach maintains remarkable stability (e.g., maintaining an ARI of 0.132 on osmFISH even at Q4), indicating that biological priors act as a vital regularizer when the transcriptomic signal is severely degraded.
Furthermore, we evaluated two OpenAI embedding models: text-embedding-ada-002 (ada) and text-embedding-3-large (large). Our analysis shows that while simple concatenation only yields marginal gains, our Weighted fusion strategy achieves the best overall performance. Notably, the large model variant exhibits superior robustness in the most challenging scenarios (Q3–Q4), providing the highest ARI across both datasets. However, ada offers a comparable balance with lower computational overhead, which we selected as the default configuration for general efficiency.
Cross-platform generalizability: validation on osmFISH
4.3
To ensure that SpaLLM is technology-agnostic, we extended our evaluation to the osmFISH dataset (mouse somatosensory cortex). Unlike sequencing-based platforms, osmFISH is an imaging-based technology with a high spatial resolution but a specific gene panel (33 marker genes).
We applied SpaLLM to four representative baselines: SpaceFlow, STAGATE, GraphST, and stCluster. As summarized in Table 6, SpaLLM consistently improved the ARI across all quality levels for every baseline. For instance, GraphST + SpaLLM achieved the highest ARI of 0.52 at Q0 (compared to 0.48 for base GraphST) and maintained a significant lead even at Q4 (0.18 vs. 0.09). These results, accompanied by lower performance variance (standard deviations), prove that integrating LLM-derived knowledge provides a universal enhancement for spatial domain identification that is independent of the underlying experimental modality or algorithmic architecture.
Practical guidance for low-throughput and small-sample regions
4.4
To provide concrete guidance for practitioners working with limited tissue sections, we analyzed the performance of SpaLLM on small spatial subregions. We randomly extracted 10 contiguous subregions, each consisting of only 1,000 cells, from the osmFISH tissue.
As summarized in Table 7, the relative performance improvement introduced by SpaLLM is even more pronounced in these small-sample scenarios compared to full-tissue analysis. For example, GraphST’s performance gain increases from 8.3% on full tissue to 27.3% on subregions. This suggests that when spatial context is limited, LLM-derived gene functional knowledge helps anchor the identity of cell clusters, effectively compensating for the lack of local cell-cell interaction information. Based on these results, we recommend SpaLLM as a critical enhancement for experiments involving small biopsies or sparse cell populations where traditional methods often fail to recover clear domain boundaries.
Conclusion and discussion
5
We presented SpaLLM, a general framework that integrates large language model embeddings of gene functional descriptions into spatial transcriptomics analysis. By leveraging pre-computed GenePT features and combining them with expression data through weighted matrix integration, SpaLLM consistently improves spatial domain identification across varying data quality conditions.
Our comprehensive evaluation on 12 sequencing-based DLPFC datasets and an independent imaging-based osmFISH dataset demonstrates substantial improvements in clustering accuracy. The gains range from 4% to 8% in high-quality data to remarkable 200%–400% improvements in severely degraded scenarios. The modular design enables seamless integration with existing spatial analysis methods including SpaceFlow, STAGATE, GraphST, and stCluster, making SpaLLM broadly applicable to diverse research scenarios regardless of the underlying experimental modality.
The success of SpaLLM demonstrates several key advantages: incorporating gene functional knowledge leads to more biologically meaningful clustering results, as evidenced by consistent improvements across all tested methods. Detailed ablation studies confirm that our weighted fusion strategy outperforms simple concatenation, and while newer models like text-embedding-3-large provide superior stability in extreme sparsity, text-embedding-ada-002 remains a highly efficient default for routine analysis. Functional features provide stable signals even when expression data is sparse or degraded, with the most dramatic improvements observed in Q3 and Q4 quality scenarios. Furthermore, our subregion analysis reveals that SpaLLM is particularly transformative for small-scale tissue samples (e.g., 1,000 cells), where the relative improvement in ARI reaches up to 27.3%, effectively compensating for limited spatial context.
While SpaLLM shows consistent effectiveness, its current implementation depends on the accuracy of large language model embeddings for capturing gene functional relationships. However, with the rapid advancement of language model architectures and the continuous expansion of biological knowledge databases, we anticipate that this limitation will be progressively overcome, leading to even more precise functional representations that better capture the complexity of biological systems.
This work establishes a foundation for knowledge-guided spatial omics analysis and demonstrates the potential for large language models to enhance biological discovery through the systematic integration of functional knowledge. The consistent improvements across diverse datasets, varying tissue sizes, and methods suggest that functional knowledge integration represents a promising paradigm for advancing spatial transcriptomics analysis.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Chen Y. Zou J. (2024). Genept: a simple but effective foundation model for genes and cells built from chatgpt. bio Rxiv, 10. 10.1101/2023.10.16.562533 37905130 PMC 10614824 · doi ↗ · pubmed ↗
- 2Chen A. Liao S. Cheng M. Ma K. Wu L. Lai Y. (2022). Spatiotemporal transcriptomic atlas of mouse organogenesis using dna nanoball-patterned arrays. Cell. 185, 1777–1792. 10.1016/j.cell.2022.04.003 35512705 · doi ↗ · pubmed ↗
- 3Codeluppi S. Borm L. E. Zeisel A. La Manno G. van Lunteren J. A. Svensson C. I. (2018). Spatial organization of the somatosensory cortex revealed by osmfish. Nat. Methods 15, 932–935. 10.1038/s 41592-018-0175-z 30377364 · doi ↗ · pubmed ↗
- 4Dong K. Zhang S. (2022). Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739. 10.1038/s 41467-022-29439-6 35365632 PMC 8976049 · doi ↗ · pubmed ↗
- 5Duan Z. Riffle D. Li R. Liu J. Min M. R. Zhang J. (2024 a). Impeller: a path-based heterogeneous graph learning method for spatial transcriptomic data imputation. Bioinformatics 40, btae 339. 10.1093/bioinformatics/btae 339 38806165 PMC 11256934 · doi ↗ · pubmed ↗
- 6Duan Z. Xu S. Lee C. Riffle D. Zhang J. (2024 b). “Imiracle: an iterative multi-view graph neural network to model intercellular gene regulation from spatial transcriptomic data,” in Proceedings of the 33rd ACM international conference on information and knowledge management, 538–548.10.1145/3627673.3679574 PMC 1163907439679382 · doi ↗ · pubmed ↗
- 7Duan Z. Li X. Xiao Z. Ying R. Zhang J. (2025 a). “Muse: a multi-slice joint analysis method for spatial transcriptomics experiments,” in Proceedings of the 34th ACM international conference on information and knowledge management (New York, NY, USA: Association for Computing Machinery, CIKM ’25), 625–634. 10.1145/3746252.3761240 PMC 1279062541527556 · doi ↗ · pubmed ↗
- 8Duan Z. Li X. Zhang Z. Song J. Zhang J. (2025 b). “Disco: a diffusion model for spatial transcriptomics data completion,” in 2025 IEEE international conference on image processing (ICIP) (IEEE), 19–24.10.1109/icip 55913.2025.11084277 PMC 1240749240909878 · doi ↗ · pubmed ↗
