Evaluating Large Language Models for Diagnostic Accuracy and Health Information Quality in Oral Mucosal Diseases

Melisa Iacob; Ayham Qawas; Ramesh Balasubramaniam; Agnieszka M. Frydrych; Omar Kujan

PMC · DOI:10.3390/jpm16030129·February 27, 2026

Evaluating Large Language Models for Diagnostic Accuracy and Health Information Quality in Oral Mucosal Diseases

Melisa Iacob, Ayham Qawas, Ramesh Balasubramaniam, Agnieszka M. Frydrych, Omar Kujan

PDF

Open Access

TL;DR

This study compares how well large language models and search engines diagnose oral diseases, finding that ChatGPT 4.5 performs best but still has readability issues.

Contribution

The study introduces a novel evaluation of MLLMs for oral mucosal disease diagnosis, comparing them to traditional search engines.

Findings

01

ChatGPT 4.5 showed highest diagnostic accuracy (88.5%) and PPV (92%) among MLLMs.

02

Traditional search engines had much lower accuracy (18–55%) compared to MLLMs.

03

MLLMs provided higher-quality information but were less readable than search engine results.

Abstract

Background: Multimodal large language model (MLLM)-based systems capable of generating health-related information and diagnostic suggestions are increasingly used for health information retrieval; however, their accuracy, readability, and quality in oral healthcare remain unclear. Oral mucosal diseases comprise a heterogeneous group of conditions affecting the oral lining, ranging from benign and reactive lesions to potentially malignant and malignant disorders. Objective: This study evaluated and compared the diagnostic performance, readability, and information quality of MLLMs with traditional search engines included as comparator platforms, in diagnosing oral mucosal diseases. Methods: A cross-sectional observational study was conducted using 100 validated oral mucosal case scenarios representing benign, malignant, potentially malignant, infectious, and reactive oral lesions. Each…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Figures2

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHealth Literacy and Information Accessibility · Data-Driven Disease Surveillance · Artificial Intelligence in Healthcare and Education