Evaluating AI performance in infectious disease education: a comparative analysis of ChatGPT, Google Bard, Perplexity AI, Microsoft Copilot, and Meta AI

Abdulaziz Ibrahim Alzarea; Azfar Athar Ishaqui; Muhammad Bilal Maqsood; Abdullah Salah Alanazi; Aseel Awad Alsaidan; Tauqeer Hussain Mallhi; Narendar Kumar; Muhammad Imran; Sultan M. Alshahrani; Hassan H. Alhassan; Sami I. Alzarea; Omar Awad Alsaidan

PMC · DOI:10.3389/fmed.2025.1679153·October 13, 2025

Evaluating AI performance in infectious disease education: a comparative analysis of ChatGPT, Google Bard, Perplexity AI, Microsoft Copilot, and Meta AI

Abdulaziz Ibrahim Alzarea, Azfar Athar Ishaqui, Muhammad Bilal Maqsood, Abdullah Salah Alanazi, Aseel Awad Alsaidan, Tauqeer Hussain Mallhi, Narendar Kumar, Muhammad Imran, Sultan M. Alshahrani, Hassan H. Alhassan, Sami I. Alzarea, Omar Awad Alsaidan

PDF

Open Access

TL;DR

This study compares how well different AI systems answer infectious disease questions, finding that ChatGPT 3.5 is most accurate but inconsistent, while Microsoft Copilot is more stable but less nuanced.

Contribution

The study introduces a systematic evaluation of multiple AI platforms for infectious disease education using standardized case studies and repeated testing.

Findings

01

ChatGPT 3.5 had the highest accuracy (65.6%) but showed a 7.5% decline in repeated tests.

02

AI models performed best in symptom identification (76.5%) and worst in therapy-related questions (57.1%).

03

Microsoft Copilot was the most consistent across repeated testing but lacked nuanced therapeutic reasoning.

Abstract

This study systematically evaluates and compares the performance of ChatGPT 3. 5, Google Bard (Gemini), Perplexity AI, Microsoft Copilot, and Meta AI in responding to infectious disease-related multiple-choice questions (MCQs). A systematic comparative study was conducted using 20 infectious disease case studies sourced from Infectious Diseases: A Case Study Approach by Jonathan C. Cho. Each case study included 7–10 MCQs, resulting in a total of 160 questions. AI platforms were provided with standardized prompts containing the case study text and MCQs without additional context. Their responses were evaluated against a reference answer key from the textbook. Accuracy was measured by the percentage of correct responses, and consistency was assessed by submitting identical prompts 24 h apart. ChatGPT 3.5 achieved the highest numerical accuracy (65.6%), followed by Perplexity AI (63.2%),…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases1

Infectious Diseases

Figures1

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI · Machine Learning in Healthcare