MPath: Multimodal Pathology Report Generation from Whole Slide Images
Noorul Wahab, Nasir Rajpoot

TL;DR
MPath is a novel multimodal framework that generates pathology reports from whole slide images by conditioning a pretrained language model with visual embeddings, demonstrating promising results on a challenging dataset.
Contribution
Introduces MPath, a lightweight multimodal approach using visual-prefix prompting to generate pathology reports from WSIs without end-to-end training.
Findings
Ranked 4th in RED 2025 Challenge
Effective use of foundation-model WSI features
Prompt-based conditioning improves report generation
Abstract
Automated generation of diagnostic pathology reports directly from whole slide images (WSIs) is an emerging direction in computational pathology. Translating high-resolution tissue patterns into clinically coherent text remains difficult due to large morphological variability and the complex structure of pathology narratives. We introduce MPath, a lightweight multimodal framework that conditions a pretrained biomedical language model (BioBART) on WSI-derived visual embeddings through a learned visual-prefix prompting mechanism. Instead of end-to-end vision-language pretraining, MPath leverages foundation-model WSI features (CONCH + Titan) and injects them into BioBART via a compact projection module, keeping the language backbone frozen for stability and data efficiency. MPath was developed and evaluated on the RED 2025 Grand Challenge dataset and ranked 4th in Test Phase 2, despite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI in cancer detection · Topic Modeling
