Evaluating AI performance in infectious disease education: a comparative analysis of ChatGPT, Google Bard, Perplexity AI, Microsoft Copilot, and Meta AI
Abdulaziz Ibrahim Alzarea, Azfar Athar Ishaqui, Muhammad Bilal Maqsood, Abdullah Salah Alanazi, Aseel Awad Alsaidan, Tauqeer Hussain Mallhi, Narendar Kumar, Muhammad Imran, Sultan M. Alshahrani, Hassan H. Alhassan, Sami I. Alzarea, Omar Awad Alsaidan

TL;DR
This study compares how well different AI systems answer infectious disease questions, finding that ChatGPT 3.5 is most accurate but inconsistent, while Microsoft Copilot is more stable but less nuanced.
Contribution
The study introduces a systematic evaluation of multiple AI platforms for infectious disease education using standardized case studies and repeated testing.
Findings
ChatGPT 3.5 had the highest accuracy (65.6%) but showed a 7.5% decline in repeated tests.
AI models performed best in symptom identification (76.5%) and worst in therapy-related questions (57.1%).
Microsoft Copilot was the most consistent across repeated testing but lacked nuanced therapeutic reasoning.
Abstract
This study systematically evaluates and compares the performance of ChatGPT 3. 5, Google Bard (Gemini), Perplexity AI, Microsoft Copilot, and Meta AI in responding to infectious disease-related multiple-choice questions (MCQs). A systematic comparative study was conducted using 20 infectious disease case studies sourced from Infectious Diseases: A Case Study Approach by Jonathan C. Cho. Each case study included 7–10 MCQs, resulting in a total of 160 questions. AI platforms were provided with standardized prompts containing the case study text and MCQs without additional context. Their responses were evaluated against a reference answer key from the textbook. Accuracy was measured by the percentage of correct responses, and consistency was assessed by submitting identical prompts 24 h apart. ChatGPT 3.5 achieved the highest numerical accuracy (65.6%), followed by Perplexity AI (63.2%),…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI · Machine Learning in Healthcare
