HistoPrism: Unlocking Functional Pathway Analysis from Pan-Cancer Histology via Gene Expression Prediction
Susu Hu, Qinghe Zeng, Nithya Bhasker, Jakob Nikolas Kather, Stefanie Speidel

TL;DR
HistoPrism is a transformer-based model that predicts gene expression from histology images across multiple cancer types, capturing biologically meaningful pathways and outperforming previous models in generalization and pathway prediction.
Contribution
We developed HistoPrism, a novel pan-cancer transformer model that improves gene expression prediction from histology and introduces pathway-level evaluation for biological relevance.
Findings
HistoPrism surpasses prior models in gene-level prediction accuracy.
It achieves significant improvements in pathway-level transcriptomic coherence.
The model demonstrates strong generalization across diverse cancer types.
Abstract
Predicting spatial gene expression from H&E histology offers a scalable and clinically accessible alternative to sequencing, but realizing clinical impact requires models that generalize across cancer types and capture biologically coherent signals. Prior work is often limited to per-cancer settings and variance-based evaluation, leaving functional relevance underexplored. We introduce HistoPrism, an efficient transformer-based architecture for pan-cancer prediction of gene expression from histology. To evaluate biological meaning, we introduce a pathway-level benchmark, shifting assessment from isolated gene-level variance to coherent functional pathways. HistoPrism not only surpasses prior state-of-the-art models on highly variable genes , but also more importantly, achieves substantial gains on pathway-level prediction, demonstrating its ability to recover biologically coherent…
Peer Reviews
Decision·ICLR 2026 Poster
The paper focuses on the application of ML methods to meaningful, real-world datasets, which is typically lacking in the ML field. The architecture is clearly explained and the experimental results are easy to follow. The proposed model shows competitive performance on a standard metric and improved performance on the proposed metric.
- There was not enough explanation of the GPC benchmark (e.g., no math). I would like to see an explanation that shows how it is computed from the data and model formulated in sections 3.1 and 3.2. I appreciate the effort to bridge the gap between standard ML and the computationally biology, but since this is an ML conference I believe more details should be provided. As it’s written I don’t understand how the GPC is computed. Note that I do not have a background in computational biology. - Sin
- The paper identifies a key weakness in prior work: evaluation is almost exclusively focused on a small number of highly variable genes, which have been used as the de-facto proxy for biological function. By creating a well-curated benchmark based on Hallmark and GO pathways, the authors are pushing the field toward more clinically and biologically meaningful evaluation. - The efficiency benchmarks in Figure 3, showing HistoPrism's linear scaling in time, memory, and FLOPs compared to STPath's
- The paper compares HistoPrism (a regressive model using UNI features) to STPath (a generative masked-autoencoder using GigaPath features). The discussion notes that in the STPath paper itself, an MLP with UNI features outperformed an MLP with GigaPath features. How much of HistoPrism's superior performance, particularly in the holistic clustering task (Table 2), can be attributed to the better pathology foundation models (UNI vs GigaPath) versus the superior architecture (direct-mapping vs. ma
1. This work is enabled by the release of a large ST dataset like HEST-1K. 2. This work tries to incorporate oncology labels as a prior to guide the image embeddings to capture information of different cancer types.
1. Figure 1 is not legible when printed out and read in an arms length. The caption of Figure 1 is not informative. 2. It is great that the authors are trying to make further contributions after the release of HEST-1K but the reviewer thinks that this problem generally lacks the motivation to study. 2. This paper’s contribution is quite minimal, the reviewer thinks it does not fit the ICLR standard. The method proposed and experiments conducted in this paper should be more of a workshop explora
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Single-cell and spatial transcriptomics · Gene expression and cancer classification
