Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning
Ji Young Byun, Young-Jin Park, Navid Azizan, Rama Chellappa

TL;DR
This paper proposes a zero-shot medical diagnosis framework that combines vision-language models and large language models with a test-time scaling strategy to improve diagnostic accuracy and reliability across various medical imaging modalities.
Contribution
It introduces a novel test-time scaling method that enhances LLM reasoning in medical diagnosis without requiring supervised fine-tuning or extensive annotated data.
Findings
Improved diagnostic accuracy across multiple medical imaging modalities.
Enhanced reliability and consistency of LLM-generated diagnoses.
Effective zero-shot reasoning in clinical image analysis.
Abstract
As a cornerstone of patient care, clinical decision-making significantly influences patient outcomes and can be enhanced by large language models (LLMs). Although LLMs have demonstrated remarkable performance, their application to visual question answering in medical imaging, particularly for reasoning-based diagnosis, remains largely unexplored. Furthermore, supervised fine-tuning for reasoning tasks is largely impractical due to limited data availability and high annotation costs. In this work, we introduce a zero-shot framework for reliable medical image diagnosis that enhances the reasoning capabilities of LLMs in clinical settings through test-time scaling. Given a medical image and a textual prompt, a vision-language model processes a medical image along with a corresponding textual prompt to generate multiple descriptions or interpretations of visual features. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
