# Improving TCM question answering through tree-organized self-reflective retrieval with LLMs

**Authors:** Chang Liu, Ying Chang, Jianmin Li, Yiqian Qu, Yu Li, Lingyong Cao, Shuyuan Lin

PMC · DOI: 10.3389/fmed.2026.1752778 · 2026-03-12

## TL;DR

This paper introduces a new framework called TOSRR that improves AI's ability to answer questions about Traditional Chinese Medicine by organizing knowledge hierarchically and using self-correction.

## Contribution

The novel TOSRR framework structures TCM knowledge hierarchically and uses self-reflective retrieval to enhance LLM performance in TCM Q&A.

## Key findings

- The TOSRR framework improved accuracy by 19.85% on the TCM Medical Licensing Examination benchmark.
- Recall accuracy increased from 27% to 38% on the Classics Course Exam datasets.
- Expert evaluation showed an 18.64-point improvement in safety, consistency, and other critical dimensions.

## Abstract

Large language models (LLMs) offer significant potential for intelligent question answering (Q&A) in healthcare, yet traditional knowledge representation methods fail to capture the complex, hierarchical nature of Traditional Chinese Medicine (TCM) knowledge systems. The lack of effective retrieval-augmented generation (RAG) frameworks specifically tailored for TCM’s unique epistemology limits applications.

This study aims to evaluate the effectiveness of a novel Tree-Organized Self-Reflective Retrieval (TOSRR) framework in enhancing LLM performance on TCM Q&A tasks through innovative knowledge organization and dynamic self-correction mechanisms.

We developed a hierarchical knowledge representation system that structures TCM knowledge as subject-predicate-object-text (SPO-T) units within a tree-like architecture, enabling multi-dimensional relationships while preserving semantic context. Our iterative self-reflection mechanism implements dynamic knowledge retrieval and validation across textbook chapters and disciplines. Performance was evaluated using randomly selected questions from the TCM Medical Licensing Examination (MLE) and college Classics Course Exam (CCE), representing both standardized clinical knowledge and classical theory assessment.

When integrated with GPT-4, the TOSRR framework demonstrated a 19.85% improvement in absolute accuracy on the TCM MLE benchmark and increased recall accuracy from 27 to 38% on CCE datasets. Expert manual evaluation revealed substantial enhancements across critical dimensions: safety, consistency, explainability, compliance, and coherence, with a comprehensive improvement of 18.64 points. Retrieval-Augmented Generation Assessment (RAGAs) metrics confirmed the framework’s superior knowledge utilization, retrieval precision, and resistance to information noise compared to standard RAG approaches.

The TOSRR framework enhances LLM performance in TCM knowledge tasks through its hierarchical knowledge representation and self-reflective retrieval approach. And the framework has potential for application in teaching.

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13019696/full.md

---
Source: https://tomesphere.com/paper/PMC13019696