Assessing The Potential Of Mid-Sized Language Models For Clinical QA

Elliot Bolton; Betty Xiong; Vijaytha Muralidharan; Joel Schamroth,; Vivek Muralidharan; Christopher D. Manning; Roxana Daneshjou

arXiv:2404.15894·cs.CL·April 25, 2024

Assessing The Potential Of Mid-Sized Language Models For Clinical QA

Elliot Bolton, Betty Xiong, Vijaytha Muralidharan, Joel Schamroth,, Vivek Muralidharan, Christopher D. Manning, Roxana Daneshjou

PDF

Open Access

TL;DR

This study evaluates mid-sized open-source language models for clinical question-answering, finding Mistral 7B performs best but still has room for improvement compared to larger models like Med-PaLM.

Contribution

First comprehensive head-to-head comparison of open-source mid-sized models on clinical QA tasks, highlighting Mistral 7B's superior performance.

Findings

01

Mistral 7B outperforms other models on clinical QA benchmarks.

02

Mistral 7B's MedQA score is 63.0%, nearing Med-PaLM performance.

03

Mid-sized models show promise but still need improvement for clinical deployment.

Abstract

Large language models, such as GPT-4 and Med-PaLM, have shown impressive performance on clinical tasks; however, they require access to compute, are closed-source, and cannot be deployed on device. Mid-size models such as BioGPT-large, BioMedLM, LLaMA 2, and Mistral 7B avoid these drawbacks, but their capacity for clinical tasks has been understudied. To help assess their potential for clinical use and help researchers decide which model they should use, we compare their performance on two clinical question-answering (QA) tasks: MedQA and consumer query answering. We find that Mistral 7B is the best performing model, winning on all benchmarks and outperforming models trained specifically for the biomedical domain. While Mistral 7B's MedQA score of 63.0% approaches the original Med-PaLM, and it often can produce plausible responses to consumer health queries, room for improvement still…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies

MethodsAttention Is All You Need · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing