The Falcon Series of Open Language Models
Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro, Cappelli, Ruxandra Cojocaru, M\'erouane Debbah, \'Etienne Goffinet, Daniel, Hesslow, Julien Launay, Quentin Malartic, Daniele Mazzotta, Badreddine Noune,, Baptiste Pannier, Guilherme Penedo

TL;DR
The paper introduces the Falcon series of open-source large language models, highlighting their training process, performance, and the release of datasets and models to promote open science.
Contribution
It presents the Falcon models, including the largest 180B parameter model trained on 3.5 trillion tokens, with detailed training methods and open datasets to advance open-source LLM development.
Findings
Falcon-180B outperforms models like PaLM and Chinchilla.
Falcon-180B approaches PaLM-2-Large performance at lower costs.
Open datasets and models are released to foster open science.
Abstract
We introduce the Falcon series: 7B, 40B, and 180B parameters causal decoder-only models trained on a diverse high-quality corpora predominantly assembled from web data. The largest model, Falcon-180B, has been trained on over 3.5 trillion tokens of text--the largest openly documented pretraining run. Falcon-180B significantly outperforms models such as PaLM or Chinchilla, and improves upon concurrently developed models such as LLaMA 2 or Inflection-1. It nears the performance of PaLM-2-Large at a reduced pretraining and inference cost, making it, to our knowledge, one of the three best language models in the world along with GPT-4 and PaLM-2-Large. We report detailed evaluations, as well as a deep dive into the methods and custom tooling employed to pretrain Falcon. Notably, we report on our custom distributed training codebase, allowing us to efficiently pretrain these models on up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗tiiuae/falcon-11Bmodel· 5.0k dl· ♡ 2185.0k dl♡ 218
- 🤗LoneStriker/falcon-11B-GGUFmodel· 59 dl· ♡ 359 dl♡ 3
- 🤗vsevolodl/falcon-11B-GGUFmodel· 27 dl· ♡ 127 dl♡ 1
- 🤗RichardErkhov/tiiuae_-_falcon-11B-ggufmodel· 78 dl78 dl
- 🤗LiteLLMs/falcon-11B-GGUFmodel· 22 dl22 dl
- 🤗QuantFactory/falcon-11B-GGUFmodel· 261 dl· ♡ 3261 dl♡ 3
- 🤗cortexso/falcon3model· 43 dl· ♡ 143 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings
