Evaluating Large Language Models for Diagnostic Accuracy and Health Information Quality in Oral Mucosal Diseases
Melisa Iacob, Ayham Qawas, Ramesh Balasubramaniam, Agnieszka M. Frydrych, Omar Kujan

TL;DR
This study compares how well large language models and search engines diagnose oral diseases, finding that ChatGPT 4.5 performs best but still has readability issues.
Contribution
The study introduces a novel evaluation of MLLMs for oral mucosal disease diagnosis, comparing them to traditional search engines.
Findings
ChatGPT 4.5 showed highest diagnostic accuracy (88.5%) and PPV (92%) among MLLMs.
Traditional search engines had much lower accuracy (18–55%) compared to MLLMs.
MLLMs provided higher-quality information but were less readable than search engine results.
Abstract
Background: Multimodal large language model (MLLM)-based systems capable of generating health-related information and diagnostic suggestions are increasingly used for health information retrieval; however, their accuracy, readability, and quality in oral healthcare remain unclear. Oral mucosal diseases comprise a heterogeneous group of conditions affecting the oral lining, ranging from benign and reactive lesions to potentially malignant and malignant disorders. Objective: This study evaluated and compared the diagnostic performance, readability, and information quality of MLLMs with traditional search engines included as comparator platforms, in diagnosing oral mucosal diseases. Methods: A cross-sectional observational study was conducted using 100 validated oral mucosal case scenarios representing benign, malignant, potentially malignant, infectious, and reactive oral lesions. Each…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth Literacy and Information Accessibility · Data-Driven Disease Surveillance · Artificial Intelligence in Healthcare and Education
