Medical Image Understanding Improves Survival Prediction via Visual Instruction Tuning

Xixi Liu; Jorge Lazo; Andreas Hallqvist; Mikael Johansson; {\AA}se Johnsson; Jonas S Andersson; Ella \"Ang Eklund; Patrik Sund; Nasser Hosseini; Jennifer Alv\'en; Ida H\"aggstr\"om

arXiv:2604.18250·cs.CV·April 21, 2026

Medical Image Understanding Improves Survival Prediction via Visual Instruction Tuning

Xixi Liu, Jorge Lazo, Andreas Hallqvist, Mikael Johansson, {\AA}se Johnsson, Jonas S Andersson, Ella \"Ang Eklund, Patrik Sund, Nasser Hosseini, Jennifer Alv\'en, Ida H\"aggstr\"om

PDF

TL;DR

This paper introduces a vision-language model trained on CT images and radiology reports that enhances survival prediction and generates meaningful clinical language responses.

Contribution

It presents a novel visual instruction tuning framework for 3D CT understanding that improves survival prediction and interpretability.

Findings

01

Outperforms baseline survival prediction methods.

02

Enhances prediction especially with limited clinical data.

03

Generates clinically meaningful language responses.

Abstract

Accurate prognostication and risk estimation are essential for guiding clinical decision-making and optimizing patient management. While radiologist-assessed features from CT scans provide valuable indicators of disease severity and outcomes, interpreting such images requires expert knowledge, and translating rich visual information into textual summaries inevitably leads to information loss. In this work, we propose a vision-language framework for 3D CT image understanding that leverages large-scale open-sourced CT images paired with radiology reports through visual instruction tuning. This pre-training enables the model to learn clinically meaningful visual-textual representations, which can then be adapted to downstream survival prediction tasks. By incorporating a survival prediction head on top of the pre-trained model, our approach improves survival prediction from CT images and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.