MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification
Anh-Tien Nguyen, Duy Minh Ho Nguyen, Nghiem Tuong Diep, Trung Quoc Nguyen, Nhat Ho, Jacqueline Michelle Metsch, Miriam Cindy Maurer, Daniel Sonntag, Hanibal Bohnenberger, Anne-Christin Hauschild

TL;DR
This paper presents MGPATH, a novel vision-language model with multi-granular prompt learning designed for few-shot whole slide image classification, effectively capturing detailed and contextual features to improve accuracy in pathology analysis.
Contribution
The paper introduces a multi-granular attention mechanism and a contrastive learning framework to adapt large vision-language models for few-shot pathology classification, enhancing feature interaction and robustness.
Findings
Outperforms recent competitors on multiple pathology datasets.
Improves recognition of complex patterns across sub-regions.
Enhances model robustness with optimal transport-based distance.
Abstract
Whole slide pathology image classification presents challenges due to gigapixel image sizes and limited annotation labels, hindering model generalization. This paper introduces a prompt learning method to adapt large vision-language models for few-shot pathology classification. We first extend the Prov-GigaPath vision foundation model, pre-trained on 1.3 billion pathology image tiles, into a vision-language model by adding adaptors and aligning it with medical text encoders via contrastive learning on 923K image-text pairs. The model is then used to extract visual features and text embeddings from few-shot annotations and fine-tunes with learnable prompt embeddings. Unlike prior methods that combine prompts with frozen features using prefix embeddings or self-attention, we propose multi-granular attention that compares interactions between learnable prompts with individual image patches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
MethodsSoftmax · Attention Is All You Need · Contrastive Learning · Contrastive Language-Image Pre-training · Pathology Language and Image Pre-Training
