TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models
Ping Yu, Kaitao Song, Fengchen He, Ming Chen, Jianfeng Lu

TL;DR
This paper introduces TCMD, a comprehensive Chinese Medicine QA dataset designed to evaluate and improve large language models' performance in Traditional Chinese Medicine question-answering tasks.
Contribution
The paper presents a new large-scale TCM QA dataset with annotated questions, supporting comprehensive assessment and analysis of LLMs in the TCM domain.
Findings
Current LLMs show inconsistency in TCM QA tasks.
Evaluation reveals gaps in LLMs' robustness and accuracy.
The dataset facilitates future development of TCM-specific LLMs.
Abstract
The recently unprecedented advancements in Large Language Models (LLMs) have propelled the medical community by establishing advanced medical-domain models. However, due to the limited collection of medical datasets, there are only a few comprehensive benchmarks available to gauge progress in this area. In this paper, we introduce a new medical question-answering (QA) dataset that contains massive manual instruction for solving Traditional Chinese Medicine examination tasks, called TCMD. Specifically, our TCMD collects massive questions across diverse domains with their annotated medical subjects and thus supports us in comprehensively assessing the capability of LLMs in the TCM domain. Extensive evaluation of various general LLMs and medical-domain-specific LLMs is conducted. Moreover, we also analyze the robustness of current LLMs in solving TCM QA tasks by introducing randomness. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraditional Chinese Medicine Studies · Biomedical Text Mining and Ontologies
