Exploring Prompt Alignment with Clinical Factors in Zero-Shot Segmentation VLMs for NSCLC Tumor Segmentation
Suraj Pai, Thibault Heintz, Cosmin Ciausu, Marion Tonneau, Hugo Aerts, Raymond Mak

TL;DR
This study investigates how prompt components influence zero-shot vision-language models' ability to segment NSCLC tumors, revealing anatomical location as the primary spatial cue and emphasizing prompt design's importance.
Contribution
It provides a detailed analysis of prompt alignment factors in zero-shot segmentation VLMs, highlighting the dominance of anatomical location and patient-specific cues.
Findings
Anatomical location is the main driver of spatial attention in VoxTell.
Prompt specificity improves segmentation accuracy, except for diagnosis-only prompts.
VoxTell's performance is comparable to fine-tuned models and outperforms other zero-shot models.
Abstract
Zero-shot vision-language models (VLMs) offer a promptable alternative to task-specific training for gross tumor volume (GTV) delineation in non-small-cell lung cancer (NSCLC), but the prompt dimensions that govern their spatial behavior remain poorly understood. We study this question by probing alignment directions in VoxTell on a held-out internal NSCLC tumor dataset through sub-prompt decomposition into diagnosis, demographic, staging, anatomical, generic, and irrelevant controls; attribute-wise perturbation robustness; specificity ladders; and cross-case prompt swaps, while benchmarking against fine-tuned and zero-shot baselines using the Dice Similarity Coefficient (DSC) with Wilcoxon signed-rank tests and Benjamini-Hochberg correction. Alignment analyses revealed that anatomical location is the dominant driver of VoxTell's spatial attention: 63.4 percent of location perturbations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
