$MV_{Hybrid}$: Improving Spatial Transcriptomics Prediction with Hybrid State Space-Vision Transformer Backbone in Pathology Vision Foundation Models

Won June Cho; Hongjun Yoon; Daeky Jeong; Hyeongyeol Lim; Yosep Chong

arXiv:2508.00383·cs.CV·August 4, 2025

$MV_{Hybrid}$: Improving Spatial Transcriptomics Prediction with Hybrid State Space-Vision Transformer Backbone in Pathology Vision Foundation Models

Won June Cho, Hongjun Yoon, Daeky Jeong, Hyeongyeol Lim, Yosep Chong

PDF

Open Access

TL;DR

This paper introduces a hybrid backbone architecture combining state space models with Vision Transformers to improve the prediction of spatial gene expression from pathology images, outperforming existing models in accuracy and robustness.

Contribution

The paper proposes $MV_{Hybrid}$, a novel hybrid architecture that leverages state space models with ViTs, demonstrating superior performance in pathology visual foundation models.

Findings

01

$MV_{Hybrid}$ achieves 57% higher correlation in LOSO evaluation.

02

It shows 43% smaller performance degradation in gene expression prediction.

03

Performs equally or better in classification, retrieval, and survival tasks.

Abstract

Spatial transcriptomics reveals gene expression patterns within tissue context, enabling precision oncology applications such as treatment response prediction, but its high cost and technical complexity limit clinical adoption. Predicting spatial gene expression (biomarkers) from routine histopathology images offers a practical alternative, yet current vision foundation models (VFMs) in pathology based on Vision Transformer (ViT) backbones perform below clinical standards. Given that VFMs are already trained on millions of diverse whole slide images, we hypothesize that architectural innovations beyond ViTs may better capture the low-frequency, subtle morphological patterns correlating with molecular phenotypes. By demonstrating that state space models initialized with negative real eigenvalues exhibit strong low-frequency bias, we introduce $M V_{H y b r i d}$ , a hybrid backbone architecture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Single-cell and spatial transcriptomics · Domain Adaptation and Few-Shot Learning