A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer

Yuhui Tao; Zhongwei Zhao; Zilong Wang; Xufang Luo; Feng Chen; Kang Wang; Chuanfu Wu; Xue Zhang; Shaoting Zhang; Jiaxi Yao; Xingwei Jin; Xinyang Jiang; Yifan Yang; Dongsheng Li; Lili Qiu; Zhiqiang Shao; Jianming Guo; Nengwang Yu; Shuo Wang; Ying Xiong

arXiv:2508.16569·eess.IV·August 25, 2025

A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer

Yuhui Tao, Zhongwei Zhao, Zilong Wang, Xufang Luo, Feng Chen, Kang Wang, Chuanfu Wu, Xue Zhang, Shaoting Zhang, Jiaxi Yao, Xingwei Jin, Xinyang Jiang, Yifan Yang, Dongsheng Li, Lili Qiu, Zhiqiang Shao, Jianming Guo, Nengwang Yu, Shuo Wang, Ying Xiong

PDF

1 Models

TL;DR

This study introduces RenalCLIP, a disease-specific vision-language model trained on extensive CT data, which improves diagnosis, prognosis, and management of kidney cancer with high accuracy and data efficiency.

Contribution

The paper presents RenalCLIP, a novel disease-centric foundation model for renal mass analysis, demonstrating superior performance and generalizability over existing models in multiple clinical tasks.

Findings

01

RenalCLIP outperforms state-of-the-art models in 10 clinical tasks.

02

Achieves a 20% higher C-index in recurrence-free survival prediction.

03

Requires only 20% of training data to reach peak diagnostic performance.

Abstract

The non-invasive assessment of increasingly incidentally discovered renal masses is a critical challenge in urologic oncology, where diagnostic uncertainty frequently leads to the overtreatment of benign or indolent tumors. In this study, we developed and validated RenalCLIP using a dataset of 27,866 CT scans from 8,809 patients across nine Chinese medical centers and the public TCIA cohort, a visual-language foundation model for characterization, diagnosis and prognosis of renal mass. The model was developed via a two-stage pre-training strategy that first enhances the image and text encoders with domain-specific knowledge before aligning them through a contrastive learning objective, to create robust representations for superior generalization and diagnostic precision. RenalCLIP achieved better performance and superior generalizability across 10 core tasks spanning the full clinical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
taoyh/RenalCLIP
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.