Towards Hierarchical Regional Transformer-based Multiple Instance Learning
Josef Cersovsky, Sadegh Mohammadi, Dagmar Kainmueller, Johannes, Hoehne

TL;DR
This paper introduces a hierarchical regional Transformer-based multiple instance learning method for classifying gigapixel histopathology images, improving accuracy by focusing on high-attention regions and processing features at multiple spatial levels.
Contribution
It replaces traditional attention with a regional, hierarchical self-attention mechanism inspired by Vision Transformers, enhancing classification performance in digital pathology.
Findings
Significant performance improvement over baseline models.
Effective focus on high-attention regions during inference.
Hierarchical processing of features at different spatial levels.
Abstract
The classification of gigapixel histopathology images with deep multiple instance learning models has become a critical task in digital pathology and precision medicine. In this work, we propose a Transformer-based multiple instance learning approach that replaces the traditional learned attention mechanism with a regional, Vision Transformer inspired self-attention mechanism. We present a method that fuses regional patch information to derive slide-level predictions and show how this regional aggregation can be stacked to hierarchically process features on different distance levels. To increase predictive accuracy, especially for datasets with small, local morphological features, we introduce a method to focus the image processing on high attention regions during inference. Our approach is able to significantly improve performance over the baseline on two histopathology datasets and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Colorectal Cancer Screening and Detection · Digital Imaging for Blood Diseases
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dropout · Position-Wise Feed-Forward Layer · Vision Transformer · Byte Pair Encoding · Adam · Focus
