BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Elliot Bolton, Abhinav Venigalla, Michihiro Yasunaga, David Hall,, Betty Xiong, Tony Lee, Roxana Daneshjou, Jonathan Frankle, Percy Liang,, Michael Carbin, Christopher D. Manning

TL;DR
BioMedLM is a compact 2.7B parameter biomedical language model trained solely on PubMed data, achieving competitive performance on biomedical NLP tasks and offering a privacy-preserving, efficient alternative to larger models.
Contribution
The paper introduces BioMedLM, a smaller biomedical language model trained exclusively on PubMed data, demonstrating competitive results and practical applications.
Findings
Achieves 57.3% on MedMCQA (dev)
Scores 69.0% on MMLU Medical Genetics
Can generate useful medical answers
Abstract
Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks. However, these models have hundreds of billions of parameters, are computationally expensive to run, require users to send their input data over the internet, and are trained on unknown data sources. Can smaller, more targeted models compete? To address this question, we build and release BioMedLM, a 2.7 billion parameter GPT-style autoregressive model trained exclusively on PubMed abstracts and full articles. When fine-tuned, BioMedLM can produce strong multiple-choice biomedical question-answering results competitive with much larger models, such as achieving a score of 57.3% on MedMCQA (dev) and 69.0% on the MMLU Medical Genetics exam. BioMedLM can also be fine-tuned to produce useful answers to patient questions on medical topics. This demonstrates that smaller…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Is All You Need · Layer Normalization · Byte Pair Encoding · Softmax · Dropout · Multi-Head Attention
