Assessing LLMs' Performance: Insights from the Chinese Pharmacist Exam
Xinran Wang, Boran Zhu, Shujuan Zhou, Ziwen Long, Dehua Zhou, Shu Zhang

TL;DR
This study compares the performance of two large language models, ChatGPT-4o and DeepSeek-R1, on the Chinese Pharmacist Licensing Examination, revealing that the domain-specific DeepSeek-R1 outperforms the general-purpose ChatGPT-4o in accuracy and consistency.
Contribution
The paper provides a comparative analysis of LLMs on real exam questions, highlighting the superior performance of a domain-specific model in a high-stakes certification context.
Findings
DeepSeek-R1 achieved 90.0% accuracy versus 76.1% for ChatGPT-4o.
DeepSeek-R1 showed consistent advantages across modules.
Performance differences were not statistically significant in year-wise comparisons.
Abstract
Background: As large language models (LLMs) become increasingly integrated into digital health education and assessment workflows, their capabilities in supporting high-stakes, domain-specific certification tasks remain underexplored.In China, the national pharmacist licensure exam serves as a standardized benchmark for evaluating pharmacists' clinical and theoretical competencies. Objective: This study aimed to compare the performance of two LLMs: ChatGPT-4o and DeepSeek-R1 on real questions from the Chinese Pharmacist Licensing Examination (2017-2021), and to discuss the implications of these performance differences for AI-enabled formative evaluation. Methods: A total of 2,306 multiple-choice (text-only) questions were compiled from official exams, training materials, and public databases. Questions containing tables or images were excluded. Each item was input in its original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Simulation-Based Education in Healthcare
