Assessing LLMs' Performance: Insights from the Chinese Pharmacist Exam

Xinran Wang; Boran Zhu; Shujuan Zhou; Ziwen Long; Dehua Zhou; Shu Zhang

arXiv:2511.20526·cs.AI·November 26, 2025

Assessing LLMs' Performance: Insights from the Chinese Pharmacist Exam

Xinran Wang, Boran Zhu, Shujuan Zhou, Ziwen Long, Dehua Zhou, Shu Zhang

PDF

Open Access

TL;DR

This study compares the performance of two large language models, ChatGPT-4o and DeepSeek-R1, on the Chinese Pharmacist Licensing Examination, revealing that the domain-specific DeepSeek-R1 outperforms the general-purpose ChatGPT-4o in accuracy and consistency.

Contribution

The paper provides a comparative analysis of LLMs on real exam questions, highlighting the superior performance of a domain-specific model in a high-stakes certification context.

Findings

01

DeepSeek-R1 achieved 90.0% accuracy versus 76.1% for ChatGPT-4o.

02

DeepSeek-R1 showed consistent advantages across modules.

03

Performance differences were not statistically significant in year-wise comparisons.

Abstract

Background: As large language models (LLMs) become increasingly integrated into digital health education and assessment workflows, their capabilities in supporting high-stakes, domain-specific certification tasks remain underexplored.In China, the national pharmacist licensure exam serves as a standardized benchmark for evaluating pharmacists' clinical and theoretical competencies. Objective: This study aimed to compare the performance of two LLMs: ChatGPT-4o and DeepSeek-R1 on real questions from the Chinese Pharmacist Licensing Examination (2017-2021), and to discuss the implications of these performance differences for AI-enabled formative evaluation. Methods: A total of 2,306 multiple-choice (text-only) questions were compiled from official exams, training materials, and public databases. Questions containing tables or images were excluded. Each item was input in its original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Simulation-Based Education in Healthcare