Multi-Scale Prototypical Transformer for Whole Slide Image Classification
Saisai Ding, Jun Wang, Juncheng Li, and Jun Shi

TL;DR
This paper introduces a multi-scale prototypical Transformer that improves whole slide image classification by effectively fusing multi-scale features and reducing redundant instances, outperforming existing methods.
Contribution
The paper presents a novel MSPT model combining prototypical learning with Transformer architecture and multi-scale feature fusion for enhanced WSI classification.
Findings
MSPT outperforms existing algorithms on public datasets.
Prototypical Transformer reduces redundant instances effectively.
Multi-scale feature fusion improves classification accuracy.
Abstract
Whole slide image (WSI) classification is an essential task in computational pathology. Despite the recent advances in multiple instance learning (MIL) for WSI classification, accurate classification of WSIs remains challenging due to the extreme imbalance between the positive and negative instances in bags, and the complicated pre-processing to fuse multi-scale information of WSI. To this end, we propose a novel multi-scale prototypical Transformer (MSPT) for WSI classification, which includes a prototypical Transformer (PT) module and a multi-scale feature fusion module (MFFM). The PT is developed to reduce redundant instances in bags by integrating prototypical learning into the Transformer architecture. It substitutes all instances with cluster prototypes, which are then re-calibrated through the self-attention mechanism of the Trans-former. Thereafter, an MFFM is proposed to fuse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Imaging for Blood Diseases · AI in cancer detection · Image Retrieval and Classification Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Byte Pair Encoding · Average Pooling · Linear Layer · Label Smoothing · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Position-Wise Feed-Forward Layer
