Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning

Ji Young Byun; Young-Jin Park; Navid Azizan; Rama Chellappa

arXiv:2506.11166·cs.CV·June 16, 2025

Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning

Ji Young Byun, Young-Jin Park, Navid Azizan, Rama Chellappa

PDF

Open Access

TL;DR

This paper proposes a zero-shot medical diagnosis framework that combines vision-language models and large language models with a test-time scaling strategy to improve diagnostic accuracy and reliability across various medical imaging modalities.

Contribution

It introduces a novel test-time scaling method that enhances LLM reasoning in medical diagnosis without requiring supervised fine-tuning or extensive annotated data.

Findings

01

Improved diagnostic accuracy across multiple medical imaging modalities.

02

Enhanced reliability and consistency of LLM-generated diagnoses.

03

Effective zero-shot reasoning in clinical image analysis.

Abstract

As a cornerstone of patient care, clinical decision-making significantly influences patient outcomes and can be enhanced by large language models (LLMs). Although LLMs have demonstrated remarkable performance, their application to visual question answering in medical imaging, particularly for reasoning-based diagnosis, remains largely unexplored. Furthermore, supervised fine-tuning for reasoning tasks is largely impractical due to limited data availability and high annotation costs. In this work, we introduce a zero-shot framework for reliable medical image diagnosis that enhances the reasoning capabilities of LLMs in clinical settings through test-time scaling. Given a medical image and a textual prompt, a vision-language model processes a medical image along with a corresponding textual prompt to generate multiple descriptions or interpretations of visual features. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques