Can Agents Distinguish Visually Hard-to-Separate Diseases in a Zero-Shot Setting? A Pilot Study

Zihao Zhao; Frederik Hauke; Juliana De Castilhos; Sven Nebelung; Daniel Truhn

arXiv:2602.22959·cs.CV·February 27, 2026

Can Agents Distinguish Visually Hard-to-Separate Diseases in a Zero-Shot Setting? A Pilot Study

Zihao Zhao, Frederik Hauke, Juliana De Castilhos, Sven Nebelung, Daniel Truhn

PDF

Open Access

TL;DR

This study evaluates the ability of multimodal large language model agents to distinguish visually confounded diseases in a zero-shot setting, highlighting potential and limitations for clinical application.

Contribution

Introduces a multi-agent contrastive adjudication framework to benchmark zero-shot diagnostic performance on challenging medical imaging tasks.

Findings

01

11-percentage-point accuracy improvement on dermoscopy data

02

Reduced unsupported claims in qualitative analysis

03

Performance remains below clinical deployment standards

Abstract

The rapid progress of multimodal large language models (MLLMs) has led to increasing interest in agent-based systems. While most prior work in medical imaging concentrates on automating routine clinical workflows, we study an underexplored yet clinically significant setting: distinguishing visually hard-to-separate diseases in a zero-shot setting. We benchmark representative agents on two imaging-only proxy diagnostic tasks, (1) melanoma vs. atypical nevus and (2) pulmonary edema vs. pneumonia, where visual features are highly confounded despite substantial differences in clinical management. We introduce a multi-agent framework based on contrastive adjudication. Experimental results show improved diagnostic performance (an 11-percentage-point gain in accuracy on dermoscopy data) and reduced unsupported claims on qualitative samples, although overall performance remains insufficient for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCutaneous Melanoma Detection and Management · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI