# SpaLLM: a general framework for spatial domain identification with large language models

**Authors:** Zeyu Zou, Ziheng Duan

PMC · DOI: 10.3389/fbinf.2025.1713975 · Frontiers in Bioinformatics · 2026-01-12

## TL;DR

SpaLLM improves spatial domain identification in tissues by combining gene expression data with biological knowledge from gene descriptions using large language models.

## Contribution

SpaLLM introduces a novel framework that integrates LLM-derived gene functional knowledge with spatial transcriptomics data for better spatial domain identification.

## Key findings

- SpaLLM enhances spatial domain identification when tested on Visium and osmFISH datasets.
- The framework improves performance by combining gene expression with biologically informed gene representations.
- SpaLLM is modular and compatible with existing spatial analysis pipelines.

## Abstract

Spatial transcriptomics (ST) technologies enable the profiling of gene expression while preserving spatial context, offering unprecedented insights into tissue organization. However, traditional spatial domain identification methods primarily rely on gene expression matrices and spatial coordinates while overlooking the rich biological knowledge encoded in gene functional descriptions. Here, we propose SpaLLM, a general framework that integrates large language model (LLM) embeddings of gene descriptions with conventional spatial transcriptomics analysis. Our approach leverages pre-computed GenePT embeddings from NCBI gene summaries to create biologically-informed gene representations. SpaLLM combines these LLM-derived gene features with cell-gene expression matrices through matrix multiplication, generating enriched cell representations that capture both expression patterns and functional knowledge. These enriched features are then integrated with existing graph-based spatial analysis methods for improved spatial domain identification. Extensive validation on 12 sequencing-based Visium sections and an independent imaging-based osmFISH dataset demonstrate that SpaLLM consistently enhances spatial domain identification. Our modular framework can be seamlessly integrated with existing spatial analysis pipelines, making it broadly applicable to diverse research scenarios.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12833451/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12833451/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/PMC12833451/full.md

---
Source: https://tomesphere.com/paper/PMC12833451