Zero-shot segmentation of skin tumors in whole-slide images with vision-language foundation models
Santiago Moreno, Pablo Meseguer, Roc\'io del Amor, Valery Naranjo

TL;DR
This paper presents ZEUS, a zero-shot segmentation framework using vision-language models to accurately delineate skin tumors in whole-slide images, reducing annotation effort and enabling scalable diagnostic support.
Contribution
Introduces ZEUS, a novel zero-shot, fully automated segmentation pipeline leveraging frozen vision-language models for high-resolution tumor masks in histopathology WSIs.
Findings
Competitive performance on in-house datasets
Prompt design significantly affects segmentation quality
Framework reduces annotation effort and improves scalability
Abstract
Accurate annotation of cutaneous neoplasm biopsies represents a major challenge due to their wide morphological variability, overlapping histological patterns, and the subtle distinctions between benign and malignant lesions. Vision-language foundation models (VLMs), pre-trained on paired image-text corpora, learn joint representations that bridge visual features and diagnostic terminology, enabling zero-shot localization and classification of tissue regions without pixel-level labels. However, most existing VLM applications in histopathology remain limited to slide-level tasks or rely on coarse interactive prompts, and they struggle to produce fine-grained segmentations across gigapixel whole-slide images (WSIs). In this work, we introduce a zero-shot visual-language segmentation pipeline for whole-slide images (ZEUS), a fully automated, zero-shot segmentation framework that leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
