The Llama 3 Herd of Models
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey,, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten,, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo, Yang, Archi Mitra, Archie Sravankumar, Artem Korenev

TL;DR
This paper introduces Llama 3, a new family of multilingual foundation models with up to 405B parameters, supporting coding, reasoning, and tool use, and evaluates their performance across diverse tasks.
Contribution
It presents the development and extensive evaluation of Llama 3, including multimodal extensions, and releases the models for public use with safety measures.
Findings
Llama 3 achieves comparable quality to GPT-4 on many tasks.
Multimodal extensions perform competitively on image, video, and speech recognition.
Models are not yet broadly released due to ongoing development.
Abstract
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗meta-llama/Llama-Guard-3-8Bmodel· 83k dl· ♡ 28383k dl♡ 283
- 🤗meta-llama/Llama-Guard-4-12Bmodel· 86k dl· ♡ 8886k dl♡ 88
- 🤗benjamin-paine/taproot-commonmodel· 433 dl· ♡ 5433 dl♡ 5
- 🤗meta-llama/Llama-Guard-3-8B-INT8model· 8.6k dl· ♡ 388.6k dl♡ 38
- 🤗tokyotech-llm/Llama-3.1-Swallow-8B-v0.1model· 99 dl· ♡ 1099 dl♡ 10
- 🤗Najii/Llama-Guardmodel
- 🤗Najii/Llama-Guard-3-8B-INT8model
- 🤗tokyotech-llm/Llama-3.1-Swallow-70B-v0.1model· 26 dl· ♡ 426 dl♡ 4
- 🤗tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.1model· 273 dl· ♡ 17273 dl♡ 17
- 🤗tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.1model· 46 dl· ♡ 446 dl♡ 4
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Sparse Evolutionary Training · Label Smoothing · Adam · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Dense Connections
