MultiST: A Cross-Attention-Based Multimodal Model for Spatial Transcriptomic

Wei Wang; Quoc-Toan Ly; Chong Yu; and Jun Bai

arXiv:2601.13331·cs.CV·January 21, 2026

MultiST: A Cross-Attention-Based Multimodal Model for Spatial Transcriptomic

Wei Wang, Quoc-Toan Ly, Chong Yu, and Jun Bai

PDF

Open Access

TL;DR

MultiST is a novel multimodal framework that integrates spatial transcriptomics and histological images using cross-attention, improving tissue domain delineation and biological interpretability.

Contribution

It introduces a cross-attention-based fusion approach combining gene expression and tissue morphology for enhanced spatial domain resolution.

Findings

01

Produces clearer, more coherent tissue domain boundaries.

02

Generates more stable pseudotime trajectories.

03

Enhances biological interpretability of cell interactions.

Abstract

Spatial transcriptomics (ST) enables transcriptome-wide profiling while preserving the spatial context of tissues, offering unprecedented opportunities to study tissue organization and cell-cell interactions in situ. Despite recent advances, existing methods often lack effective integration of histological morphology with molecular profiles, relying on shallow fusion strategies or omitting tissue images altogether, which limits their ability to resolve ambiguous spatial domain boundaries. To address this challenge, we propose MultiST, a unified multimodal framework that jointly models spatial topology, gene expression, and tissue morphology through cross-attention-based fusion. MultiST employs graph-based gene encoders with adversarial alignment to learn robust spatial representations, while integrating color-normalized histological features to capture molecular-morphological…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Domain Adaptation and Few-Shot Learning