Prompt Triage: Structured Optimization Enhances Vision-Language Model Performance on Medical Imaging Benchmarks
Arnav Singhvi, Vasiliki Bikia, Asad Aali, Akshay Chaudhari, Roxana Daneshjou

TL;DR
This paper demonstrates that structured automated prompt optimization significantly improves the performance of vision-language models on various medical imaging tasks, reducing the need for manual prompt engineering and enhancing clinical applicability.
Contribution
It adapts the DSPy framework for medical vision-language systems, showing substantial performance gains across multiple tasks and models without requiring large domain-specific datasets.
Findings
Median relative improvement of 53% over zero-shot baselines
Performance gains up to 3,400% on challenging tasks
Scalable, privacy-preserving evaluation pipelines
Abstract
Vision-language foundation models (VLMs) show promise for diverse imaging tasks but often underperform on medical benchmarks. Prior efforts to improve performance include model finetuning, which requires large domain-specific datasets and significant compute, or manual prompt engineering, which is hard to generalize and often inaccessible to medical institutions seeking to deploy these tools. These challenges motivate interest in approaches that draw on a model's embedded knowledge while abstracting away dependence on human-designed prompts to enable scalable, weight-agnostic performance improvements. To explore this, we adapt the Declarative Self-improving Python (DSPy) framework for structured automated prompt optimization in medical vision-language systems through a comprehensive, formal evaluation. We implement prompting pipelines for five medical imaging tasks across radiology,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
