Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology
Sabine Felde, R\"udiger Buchkremer, Gamal Chehab, Christian Thielscher, J\"org HW Distler, Matthias Schneider, Jutta G. Richter

TL;DR
This study compares large and small language models for rheumatology clinical decision support, finding smaller models with retrieval-augmented generation outperform larger models in accuracy and efficiency, though expert oversight remains crucial.
Contribution
It demonstrates that smaller language models with retrieval-augmented generation can be more effective and resource-efficient than larger models in clinical decision support for rheumatology.
Findings
Smaller models with RAG outperform larger models in diagnostic accuracy.
Smaller models require less energy and are more cost-effective.
No model achieved specialist-level accuracy in rheumatology.
Abstract
Large language models (LLMs) show promise for supporting clinical decision-making in complex fields such as rheumatology. Our evaluation shows that smaller language models (SLMs), combined with retrieval-augmented generation (RAG), achieve higher diagnostic and therapeutic performance than larger models, while requiring substantially less energy and enabling cost-efficient, local deployment. These features are attractive for resource-limited healthcare. However, expert oversight remains essential, as no model consistently reached specialist-level accuracy in rheumatology.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Rheumatoid Arthritis Research and Therapies
