Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOA
Xi Chen, MingKe You, Li Wang, WeiZhi Liu, Yu Fu, Jie Xu, Shaoting, Zhang, Gang Chen, Kang Li, Jian Li

TL;DR
This paper evaluates the performance of large language models in osteoarthritis management, introduces a specialized model called DocOA, and presents a new benchmark framework for assessing domain-specific clinical AI capabilities.
Contribution
It develops a novel benchmark framework for domain-specific LLM evaluation and introduces DocOA, a tailored LLM that improves osteoarthritis management over general models.
Findings
General LLMs underperform in OA-specific tasks
DocOA significantly outperforms GPT-3.5 and GPT-4 in OA management
Tailored LLMs enhance clinical decision support in specialized domains
Abstract
The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. This study focused on evaluating and enhancing the clinical capabilities of LLMs in specific domains, using osteoarthritis (OA) management as a case study. A domain specific benchmark framework was developed, which evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM tailored for OA management that integrates retrieval-augmented generation (RAG) and instruction prompts, was developed. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results showed that general LLMs like GPT-3.5 and GPT-4 were less effective in the specialized domain of OA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Byte Pair Encoding · Adam · Label Smoothing · Linear Layer · Multi-Head Attention · Softmax
