MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Zeming Chen; Alejandro Hern\'andez Cano; Angelika Romanou; Antoine; Bonnet; Kyle Matoba; Francesco Salvi; Matteo Pagliardini; Simin Fan; Andreas; K\"opf; Amirkeivan Mohtashami; Alexandre Sallinen; Alireza Sakhaeirad,; Vinitra Swamy; Igor Krawczuk; Deniz Bayazit; Axel Marmet; Syrielle Montariol,; Mary-Anne Hartley; Martin Jaggi; Antoine Bosselut

arXiv:2311.16079·cs.CL·November 28, 2023·118 cites

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Zeming Chen, Alejandro Hern\'andez Cano, Angelika Romanou, Antoine, Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas, K\"opf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad,, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet

PDF

Open Access 1 Repo 10 Models 5 Datasets

TL;DR

MEDITRON-70B is an open-source large medical language model with 70 billion parameters, trained on extensive medical data, outperforming many existing models and making advanced medical AI more accessible.

Contribution

The paper introduces MEDITRON, a large-scale open-source medical LLM with 7B and 70B parameters, trained on curated medical data, and demonstrates its superior performance over existing models.

Findings

01

MEDITRON-70B outperforms several state-of-the-art baselines.

02

It achieves a 6% absolute performance gain over the best public baseline.

03

MEDITRON-70B surpasses GPT-3.5 and Med-PaLM in medical benchmarks.

Abstract

Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

epfllm/meditron
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Machine Learning in Healthcare

MethodsAttention Is All You Need · LLaMA · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Cosine Annealing · Multi-Head Attention · Residual Connection · Transformer · Byte Pair Encoding