Radiomics-Guided Vision Transformers for Survival Analysis

Qiyuan Shi; Yi Li

arXiv:2604.21056·physics.med-ph·April 24, 2026

Radiomics-Guided Vision Transformers for Survival Analysis

Qiyuan Shi, Yi Li

PDF

TL;DR

This paper introduces a radiomics-guided hybrid model using Vision Transformers for survival analysis, enhancing interpretability and predictive accuracy in medical imaging, demonstrated on COVID-19 chest X-ray data.

Contribution

It proposes a novel multimodal Cox framework integrating pixel embeddings with radiomic features and explores attention mechanisms for interpretability.

Findings

01

Token-level attention can identify outcome-relevant regions.

02

Attention-based token pruning improves interpretability and performance.

03

The model achieves competitive discrimination on COVID-19 X-ray data.

Abstract

Vision Transformers (ViTs) have shown strong empirical performance on high-dimensional medical imaging data, yet their behavior under survival objectives and the interpretability of their attention mechanisms remain poorly understood. Under shallow ViTs, we design controlled experiments showing that token-level attention dynamics can recover outcome-relevant regions and that attention-based thresholding enables effective token pruning, improving both interpretability and predictive performance. We also study pretrained deep ViTs for survival analysis and propose a radiomics-guided hybrid model that integrates pixel-based embeddings with interpretable radiomic features through a multimodal Cox framework and contrastive alignment. Applied to a COVID-19 chest X-ray cohort with a composite ICU admission or mortality endpoint, the proposed approach achieves competitive discrimination while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.