Traditional Chinese Medicine Case Analysis System for High-Level Semantic Abstraction: Optimized with Prompt and RAG
Peng Xu, Hongjin Wu, Jinle Wang, Rongjia Lin, Liwei Tan

TL;DR
This paper presents a system for analyzing Traditional Chinese Medicine cases by building a structured database, utilizing web scraping, data cleaning, and advanced retrieval techniques like RAG and reranking to improve accuracy.
Contribution
The paper introduces an optimized method combining web scraping, data structuring, and retrieval enhancements such as RAG and reranking for TCM case analysis.
Findings
Collected over 5,000 TCM cases from multiple platforms.
Implemented RAG and rerank techniques to improve retrieval accuracy.
Developed a hybrid matching scheme with keyword matching for better results.
Abstract
This paper details a technical plan for building a clinical case database for Traditional Chinese Medicine (TCM) using web scraping. Leveraging multiple platforms, including 360doc, we gathered over 5,000 TCM clinical cases, performed data cleaning, and structured the dataset with crucial fields such as patient details, pathogenesis, syndromes, and annotations. Using the API, we removed redundant information and generated the final answers through the API, outputting results in standard JSON format. We optimized data recall with RAG and rerank techniques during retrieval and developed a hybrid matching scheme. By combining two-stage retrieval method with keyword matching via Jieba, we significantly enhanced the accuracy of model outputs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraditional Chinese Medicine Studies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Weight Decay · Softmax
