HiLa: Hierarchical Vision-Language Collaboration for Cancer Survival Prediction

Jiaqi Cui; Lu Wen; Yuchen Fei; Bo Liu; Luping Zhou; Dinggang Shen; Yan Wang

arXiv:2507.04613·cs.CV·July 8, 2025

HiLa: Hierarchical Vision-Language Collaboration for Cancer Survival Prediction

Jiaqi Cui, Lu Wen, Yuchen Fei, Bo Liu, Luping Zhou, Dinggang Shen, Yan Wang

PDF

TL;DR

The paper introduces HiLa, a hierarchical vision-language framework that enhances cancer survival prediction from whole-slide images by leveraging multi-level features and improved alignment with linguistic descriptions.

Contribution

It proposes a novel hierarchical vision-language collaboration framework with optimal prompt learning, cross-level propagation, and mutual contrastive learning for better survival prediction from WSIs.

Findings

01

Achieves state-of-the-art performance on three TCGA datasets.

02

Effectively models hierarchical interactions in WSIs.

03

Improves vision-language alignment for survival prediction.

Abstract

Survival prediction using whole-slide images (WSIs) is crucial in cancer re-search. Despite notable success, existing approaches are limited by their reliance on sparse slide-level labels, which hinders the learning of discriminative repre-sentations from gigapixel WSIs. Recently, vision language (VL) models, which incorporate additional language supervision, have emerged as a promising solu-tion. However, VL-based survival prediction remains largely unexplored due to two key challenges. First, current methods often rely on only one simple lan-guage prompt and basic cosine similarity, which fails to learn fine-grained associ-ations between multi-faceted linguistic information and visual features within WSI, resulting in inadequate vision-language alignment. Second, these methods primarily exploit patch-level information, overlooking the intrinsic hierarchy of WSIs and their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.