PTCMIL: Multiple Instance Learning via Prompt Token Clustering for Whole Slide Image Analysis

Beidi Zhao; SangMook Kim; Hao Chen; Chen Zhou; Zu-hua Gao; Gang Wang; Xiaoxiao Li

arXiv:2507.18848·cs.CV·July 28, 2025

PTCMIL: Multiple Instance Learning via Prompt Token Clustering for Whole Slide Image Analysis

Beidi Zhao, SangMook Kim, Hao Chen, Chen Zhou, Zu-hua Gao, Gang Wang, Xiaoxiao Li

PDF

Open Access

TL;DR

PTCMIL introduces a novel ViT-based multiple instance learning method with prompt token clustering, enhancing whole slide image analysis by improving robustness, efficiency, and interpretability across various tasks.

Contribution

It proposes a unified end-to-end framework that integrates learnable prompt tokens with clustering for WSI analysis, addressing heterogeneity and computational challenges.

Findings

01

Outperforms state-of-the-art methods on eight datasets.

02

Demonstrates superior accuracy in classification and survival analysis.

03

Shows robustness and interpretability through ablation studies.

Abstract

Multiple Instance Learning (MIL) has advanced WSI analysis but struggles with the complexity and heterogeneity of WSIs. Existing MIL methods face challenges in aggregating diverse patch information into robust WSI representations. While ViTs and clustering-based approaches show promise, they are computationally intensive and fail to capture task-specific and slide-specific variability. To address these limitations, we propose PTCMIL, a novel Prompt Token Clustering-based ViT for MIL aggregation. By introducing learnable prompt tokens into the ViT backbone, PTCMIL unifies clustering and prediction tasks in an end-to-end manner. It dynamically aligns clustering with downstream tasks, using projection-based clustering tailored to each WSI, reducing complexity while preserving patch heterogeneity. Through token merging and prototype-based pooling, PTCMIL efficiently captures task-relevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning