MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification

Anh-Tien Nguyen; Duy Minh Ho Nguyen; Nghiem Tuong Diep; Trung Quoc Nguyen; Nhat Ho; Jacqueline Michelle Metsch; Miriam Cindy Maurer; Daniel Sonntag; Hanibal Bohnenberger; Anne-Christin Hauschild

arXiv:2502.07409·cs.CV·November 4, 2025

MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification

Anh-Tien Nguyen, Duy Minh Ho Nguyen, Nghiem Tuong Diep, Trung Quoc Nguyen, Nhat Ho, Jacqueline Michelle Metsch, Miriam Cindy Maurer, Daniel Sonntag, Hanibal Bohnenberger, Anne-Christin Hauschild

PDF

Open Access 1 Repo

TL;DR

This paper presents MGPATH, a novel vision-language model with multi-granular prompt learning designed for few-shot whole slide image classification, effectively capturing detailed and contextual features to improve accuracy in pathology analysis.

Contribution

The paper introduces a multi-granular attention mechanism and a contrastive learning framework to adapt large vision-language models for few-shot pathology classification, enhancing feature interaction and robustness.

Findings

01

Outperforms recent competitors on multiple pathology datasets.

02

Improves recognition of complex patterns across sub-regions.

03

Enhances model robustness with optimal transport-based distance.

Abstract

Whole slide pathology image classification presents challenges due to gigapixel image sizes and limited annotation labels, hindering model generalization. This paper introduces a prompt learning method to adapt large vision-language models for few-shot pathology classification. We first extend the Prov-GigaPath vision foundation model, pre-trained on 1.3 billion pathology image tiles, into a vision-language model by adding adaptors and aligning it with medical text encoders via contrastive learning on 923K image-text pairs. The model is then used to extract visual features and text embeddings from few-shot annotations and fine-tunes with learnable prompt embeddings. Unlike prior methods that combine prompts with frozen features using prefix embeddings or self-attention, we propose multi-granular attention that compares interactions between learnable prompts with individual image patches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HauschildLab/MGPATH
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning

MethodsSoftmax · Attention Is All You Need · Contrastive Learning · Contrastive Language-Image Pre-training · Pathology Language and Image Pre-Training