Prompt Triage: Structured Optimization Enhances Vision-Language Model Performance on Medical Imaging Benchmarks

Arnav Singhvi; Vasiliki Bikia; Asad Aali; Akshay Chaudhari; Roxana Daneshjou

arXiv:2511.11898·cs.CV·November 18, 2025

Prompt Triage: Structured Optimization Enhances Vision-Language Model Performance on Medical Imaging Benchmarks

Arnav Singhvi, Vasiliki Bikia, Asad Aali, Akshay Chaudhari, Roxana Daneshjou

PDF

Open Access

TL;DR

This paper demonstrates that structured automated prompt optimization significantly improves the performance of vision-language models on various medical imaging tasks, reducing the need for manual prompt engineering and enhancing clinical applicability.

Contribution

It adapts the DSPy framework for medical vision-language systems, showing substantial performance gains across multiple tasks and models without requiring large domain-specific datasets.

Findings

01

Median relative improvement of 53% over zero-shot baselines

02

Performance gains up to 3,400% on challenging tasks

03

Scalable, privacy-preserving evaluation pipelines

Abstract

Vision-language foundation models (VLMs) show promise for diverse imaging tasks but often underperform on medical benchmarks. Prior efforts to improve performance include model finetuning, which requires large domain-specific datasets and significant compute, or manual prompt engineering, which is hard to generalize and often inaccessible to medical institutions seeking to deploy these tools. These challenges motivate interest in approaches that draw on a model's embedded knowledge while abstracting away dependence on human-designed prompts to enable scalable, weight-agnostic performance improvements. To explore this, we adapt the Declarative Self-improving Python (DSPy) framework for structured automated prompt optimization in medical vision-language systems through a comprehensive, formal evaluation. We implement prompting pipelines for five medical imaging tasks across radiology,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning