Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics
Chunyuan Li, Xinliang Zhu, Jiawen Yao, Junzhou Huang

TL;DR
This paper introduces a hierarchical multimodal transformer that effectively integrates multi-resolution pathology images and genomics data for improved cancer survival prediction, reducing computational costs while enhancing accuracy.
Contribution
A novel hierarchical multimodal transformer framework that combines multi-resolution WSIs and genomics data for better survival prediction with lower resource requirements.
Findings
Achieved an average c-index of 0.673 across five cancer types.
Outperformed state-of-the-art multimodality methods.
Reduced GPU resource consumption compared to benchmarks.
Abstract
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical. Previous studies employ multiple instance learning (MIL) to represent WSIs as bags of sampled patches because, for most occasions, only slide-level labels are available, and only a tiny region of the WSI is disease-positive area. However, WSI representation learning still remains an open problem due to: (1) patch sampling on a higher resolution may be incapable of depicting microenvironment information such as the relative position between the tumor cells and surrounding tissues, while patches at lower resolution lose the fine-grained detail; (2) extracting patches from giant WSI results in large bag size, which tremendously increases the computational cost. To solve the problems, this paper proposes a hierarchical-based multimodal transformer framework that learns a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Radiomics and Machine Learning in Medical Imaging · Colorectal Cancer Screening and Detection
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Adam · Absolute Position Encodings · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing
