Leveraging Group Relative Policy Optimization to Advance Large Language Models in Traditional Chinese Medicine
Jiacheng Xie, Shuai Zeng, Yang Yu, Xiaoting Tang, Guanghui An, Dong Xu

TL;DR
This paper introduces Ladder-base, a TCM-focused large language model trained with Group Relative Policy Optimization, which significantly improves reasoning and factual accuracy in traditional Chinese medicine applications.
Contribution
The study presents the first TCM-specific LLM trained with GRPO, enhancing reasoning and factual consistency over previous models and general-purpose LLMs.
Findings
Ladder-base outperforms state-of-the-art LLMs in reasoning metrics.
GRPO effectively aligns LLMs with expert-level reasoning.
Ladder-base demonstrates superior domain-specific performance.
Abstract
Traditional Chinese Medicine (TCM) presents a rich and structurally unique knowledge system that challenges conventional applications of large language models (LLMs). Although previous TCM-specific LLMs have shown progress through supervised fine-tuning, they often face limitations in alignment, data quality, and evaluation consistency. In this study, we introduce Ladder-base, the first TCM-focused LLM trained with Group Relative Policy Optimization (GRPO), a reinforcement learning method that improves reasoning and factual consistency by optimizing response selection based on intra-group comparisons. Ladder-base is built upon the Qwen2.5-7B-Instruct foundation model and trained exclusively on the textual subset of the TCM-Ladder benchmark, using 80 percent of the data for training and the remaining 20 percent split evenly between validation and test sets. Through standardized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraditional Chinese Medicine Studies · Machine Learning in Healthcare · Topic Modeling
