MINT: Molecularly Informed Training with Spatial Transcriptomics Supervision for Pathology Foundation Models
Minsoo Lee, Jonghyun Kim, Juseung Yun, Sunwoo Yu, and Jongseong Jang

TL;DR
MINT enhances pathology foundation models by integrating spatial transcriptomics data, improving gene expression prediction and pathology task performance through a novel fine-tuning framework that combines molecular and morphological information.
Contribution
MINT introduces a new fine-tuning method that incorporates spatial transcriptomics supervision into pretrained Vision Transformers for pathology analysis.
Findings
Achieved state-of-the-art gene expression prediction (mean Pearson r = 0.440).
Improved performance on pathology tasks (accuracy = 0.803).
Demonstrated the benefit of combining molecular and morphological data.
Abstract
Pathology foundation models learn morphological representations through self-supervised pretraining on large-scale whole-slide images, yet they do not explicitly capture the underlying molecular state of the tissue. Spatial transcriptomics technologies bridge this gap by measuring gene expression in situ, offering a natural cross-modal supervisory signal. We propose MINT (Molecularly Informed Training), a fine-tuning framework that incorporates spatial transcriptomics supervision into pretrained pathology Vision Transformers. MINT appends a learnable ST token to the ViT input to encode transcriptomic information separately from the morphological CLS token, preventing catastrophic forgetting through DINO self-distillation and explicit feature anchoring to the frozen pretrained encoder. Gene expression regression at both spot-level (Visium) and patch-level (Xenium) resolutions provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Domain Adaptation and Few-Shot Learning
