PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis

K Lokesh; Abhirama Subramanyam Penamakuri; Uday Agarwal; Apoorva Challa; Shreya K Gowda; Somesh Gupta; Anand Mishra

arXiv:2601.10945·cs.CV·January 19, 2026

PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis

K Lokesh, Abhirama Subramanyam Penamakuri, Uday Agarwal, Apoorva Challa, Shreya K Gowda, Somesh Gupta, Anand Mishra

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel framework where vision-language models simulate doctor-patient dialogues to improve medical diagnosis accuracy by incorporating symptom information, validated through clinical feedback.

Contribution

It proposes a pre-consultation dialogue framework with two VLMs, enabling realistic symptom elicitation and enhancing diagnostic performance beyond image-only methods.

Findings

01

Dialogue-based supervision improves diagnosis accuracy

02

Clinicians confirm the realism of synthetic symptoms

03

Coherent multi-turn diagnostic interactions are achieved

Abstract

Traditionally, AI research in medical diagnosis has largely centered on image analysis. While this has led to notable advancements, the absence of patient-reported symptoms continues to hinder diagnostic accuracy. To address this, we propose a Pre-Consultation Dialogue Framework (PCDF) that mimics real-world diagnostic procedures, where doctors iteratively query patients before reaching a conclusion. Specifically, we simulate diagnostic dialogues between two vision-language models (VLMs): a DocVLM, which generates follow-up questions based on the image and dialogue history, and a PatientVLM, which responds using a symptom profile derived from the ground-truth diagnosis. We additionally conducted a small-scale clinical validation of the synthetic symptoms generated by our framework, with licensed clinicians confirming their clinical relevance, symptom coverage, and overall realism. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PatientVLM Meets DocVLM: Pre-Consultation Dialogue Between Vision-Language Models for Efficient Diagnosis· underline

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Multimodal Machine Learning Applications