Rethinking Patient Education as Multi-turn Multi-modal Interaction

Zonghai Yao; Zhipeng Tang; Chengtao Lin; Xiong Luo; Benlu Wang; Juncheng Huang; Chin Siang Ong; Hong Yu

arXiv:2604.14656·cs.AI·April 17, 2026

Rethinking Patient Education as Multi-turn Multi-modal Interaction

Zonghai Yao, Zhipeng Tang, Chengtao Lin, Xiong Luo, Benlu Wang, Juncheng Huang, Chin Siang Ong, Hong Yu

PDF

TL;DR

This paper introduces MedImageEdu, a multimodal benchmark for patient education involving multi-turn, evidence-grounded interactions with images and text, to evaluate and improve medical AI systems.

Contribution

It presents a new benchmark with a multi-agent setup for multimodal patient education, including a drawing tool and evaluation dimensions, addressing gaps in faithfulness and safety.

Findings

01

Language often outpaces visual grounding fidelity.

02

Safety is the weakest aspect across disease categories.

03

Emotionally tense interactions are more challenging for models.

Abstract

Most medical multimodal benchmarks focus on static tasks such as image question answering, report generation, and plain-language rewriting. Patient education is more demanding: systems must identify relevant evidence across images, show patients where to look, explain findings in accessible language, and handle confusion or distress. Yet most patient education work remains text-only, even though combined image-and-text explanations may better support understanding. We introduce MedImageEdu, a benchmark for multi-turn, evidence-grounded radiology patient education. Each case provides a radiology report with report text and case images. A DoctorAgent interacts with a PatientAgent, conditioned on a hidden profile that captures factors such as education level, health literacy, and personality. When a patient question would benefit from visual support, the DoctorAgent can issue drawing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.