Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li, Yunlong Zhang, Pingyi Chen, Zhongyi Shui, Chenglu Zhu, Lin, Yang

TL;DR
This paper introduces a local-global hybrid Transformer model, LongMIL, that enhances long-context WSI analysis by improving attention matrix rank, reducing computational complexity, and effectively modeling local and global information for better cancer diagnosis.
Contribution
The paper proposes a novel local attention mask and a hybrid Transformer architecture, significantly improving long-context WSI analysis performance and efficiency over existing methods.
Findings
Local attention mask improves attention matrix rank.
Linear complexity achieved via chunked local attention.
LongMIL outperforms existing models on WSI tasks.
Abstract
Histopathology Whole Slide Image (WSI) analysis serves as the gold standard for clinical cancer diagnosis in the daily routines of doctors. To develop computer-aided diagnosis model for WSIs, previous methods typically employ Multi-Instance Learning to enable slide-level prediction given only slide-level labels. Among these models, vanilla attention mechanisms without pairwise interactions have traditionally been employed but are unable to model contextual information. More recently, self-attention models have been utilized to address this issue. To alleviate the computational complexity of long sequences in large WSIs, methods like HIPT use region-slicing, and TransMIL employs approximation of full self-attention. Both approaches suffer from suboptimal performance due to the loss of key information. Moreover, their use of absolute positional embedding struggles to effectively handle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Cell Image Analysis Techniques · Digital Imaging for Blood Diseases
MethodsAttention Is All You Need · Dense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Adam · Linear Layer · Softmax · Multi-Head Attention · Dropout
