Assessing The Potential Of Mid-Sized Language Models For Clinical QA
Elliot Bolton, Betty Xiong, Vijaytha Muralidharan, Joel Schamroth,, Vivek Muralidharan, Christopher D. Manning, Roxana Daneshjou

TL;DR
This study evaluates mid-sized open-source language models for clinical question-answering, finding Mistral 7B performs best but still has room for improvement compared to larger models like Med-PaLM.
Contribution
First comprehensive head-to-head comparison of open-source mid-sized models on clinical QA tasks, highlighting Mistral 7B's superior performance.
Findings
Mistral 7B outperforms other models on clinical QA benchmarks.
Mistral 7B's MedQA score is 63.0%, nearing Med-PaLM performance.
Mid-sized models show promise but still need improvement for clinical deployment.
Abstract
Large language models, such as GPT-4 and Med-PaLM, have shown impressive performance on clinical tasks; however, they require access to compute, are closed-source, and cannot be deployed on device. Mid-size models such as BioGPT-large, BioMedLM, LLaMA 2, and Mistral 7B avoid these drawbacks, but their capacity for clinical tasks has been understudied. To help assess their potential for clinical use and help researchers decide which model they should use, we compare their performance on two clinical question-answering (QA) tasks: MedQA and consumer query answering. We find that Mistral 7B is the best performing model, winning on all benchmarks and outperforming models trained specifically for the biomedical domain. While Mistral 7B's MedQA score of 63.0% approaches the original Med-PaLM, and it often can produce plausible responses to consumer health queries, room for improvement still…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
MethodsAttention Is All You Need · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing
