Radiomics-Guided Vision Transformers for Survival Analysis
Qiyuan Shi, Yi Li

TL;DR
This paper introduces a radiomics-guided hybrid model using Vision Transformers for survival analysis, enhancing interpretability and predictive accuracy in medical imaging, demonstrated on COVID-19 chest X-ray data.
Contribution
It proposes a novel multimodal Cox framework integrating pixel embeddings with radiomic features and explores attention mechanisms for interpretability.
Findings
Token-level attention can identify outcome-relevant regions.
Attention-based token pruning improves interpretability and performance.
The model achieves competitive discrimination on COVID-19 X-ray data.
Abstract
Vision Transformers (ViTs) have shown strong empirical performance on high-dimensional medical imaging data, yet their behavior under survival objectives and the interpretability of their attention mechanisms remain poorly understood. Under shallow ViTs, we design controlled experiments showing that token-level attention dynamics can recover outcome-relevant regions and that attention-based thresholding enables effective token pruning, improving both interpretability and predictive performance. We also study pretrained deep ViTs for survival analysis and propose a radiomics-guided hybrid model that integrates pixel-based embeddings with interpretable radiomic features through a multimodal Cox framework and contrastive alignment. Applied to a COVID-19 chest X-ray cohort with a composite ICU admission or mortality endpoint, the proposed approach achieves competitive discrimination while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
