Evaluation of large language models in rheumatology and clinical immunology: a systematic assessment based on Chinese national health professional qualification examination
Yaqing Wang, Yue Jiang, Wen Jin, Yijun Xu, Weinan Lin, Jiangda Wang, Qin Song, Zhaoxi Fang

TL;DR
This study evaluates how well large language models perform in rheumatology and immunology using a Chinese medical exam.
Contribution
The paper provides a systematic evaluation of LLMs in a specific medical subfield using a national qualification exam.
Findings
DeepSeek-R1 and Qwen3 achieved over 90% accuracy in the exam.
LLMs showed significant variation in performance across different evaluation dimensions.
Professional practice ability tasks had lower performance, indicating limitations in clinical applications.
Abstract
In recent years, large language models (LLMs) have achieved remarkable progress in natural language processing and demonstrated potential applications in medicine. However, their professional capabilities in specific medical subfields, such as immunology, still require systematic evaluation. This study systematically evaluated 11 representative LLMs, including DeepSeek, GPT, Llama, Gemma, and Qwen series, based on the Chinese National Health Professional Qualification Examination in Rheumatology and Clinical Immunology. The evaluation covered four dimensions: basic medical knowledge, related medical knowledge, immunology knowledge, and professional practice ability. Results show significant differences among LLMs. DeepSeek-R1 and Qwen3 achieve the best performance, with accuracy exceeding 90%. However, performance on professional practice ability tasks remained relatively low,…
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Clinical Reasoning and Diagnostic Skills · Rheumatoid Arthritis Research and Therapies
