MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Zeming Chen, Alejandro Hern\'andez Cano, Angelika Romanou, Antoine, Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas, K\"opf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad,, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet

TL;DR
MEDITRON-70B is an open-source large medical language model with 70 billion parameters, trained on extensive medical data, outperforming many existing models and making advanced medical AI more accessible.
Contribution
The paper introduces MEDITRON, a large-scale open-source medical LLM with 7B and 70B parameters, trained on curated medical data, and demonstrates its superior performance over existing models.
Findings
MEDITRON-70B outperforms several state-of-the-art baselines.
It achieves a 6% absolute performance gain over the best public baseline.
MEDITRON-70B surpasses GPT-3.5 and Med-PaLM in medical benchmarks.
Abstract
Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗epfl-llm/meditron-70bmodel· 331 dl· ♡ 263331 dl♡ 263
- 🤗epfl-llm/meditron-7bmodel· 6.0k dl· ♡ 3196.0k dl♡ 319
- 🤗TheBloke/meditron-70B-GPTQmodel· 1.3k dl· ♡ 51.3k dl♡ 5
- 🤗TheBloke/meditron-70B-AWQmodel· 116 dl· ♡ 6116 dl♡ 6
- 🤗TheBloke/meditron-70B-GGUFmodel· 845 dl· ♡ 20845 dl♡ 20
- 🤗TheBloke/meditron-7B-GPTQmodel· 73 dl· ♡ 373 dl♡ 3
- 🤗TheBloke/meditron-7B-AWQmodel· 5.8k dl· ♡ 45.8k dl♡ 4
- 🤗TheBloke/meditron-7B-GGUFmodel· 1.0k dl· ♡ 241.0k dl♡ 24
- 🤗AGBonnet/medinote-7bmodel· 18 dl· ♡ 1018 dl♡ 10
- 🤗AGBonnet/medinote-13bmodel· 6 dl· ♡ 26 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Machine Learning in Healthcare
MethodsAttention Is All You Need · LLaMA · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Cosine Annealing · Multi-Head Attention · Residual Connection · Transformer · Byte Pair Encoding
