The Falcon Series of Open Language Models

Ebtesam Almazrouei; Hamza Alobeidli; Abdulaziz Alshamsi; Alessandro; Cappelli; Ruxandra Cojocaru; M\'erouane Debbah; \'Etienne Goffinet; Daniel; Hesslow; Julien Launay; Quentin Malartic; Daniele Mazzotta; Badreddine Noune,; Baptiste Pannier; Guilherme Penedo

arXiv:2311.16867·cs.CL·December 1, 2023·114 cites

The Falcon Series of Open Language Models

Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro, Cappelli, Ruxandra Cojocaru, M\'erouane Debbah, \'Etienne Goffinet, Daniel, Hesslow, Julien Launay, Quentin Malartic, Daniele Mazzotta, Badreddine Noune,, Baptiste Pannier, Guilherme Penedo

PDF

Open Access 7 Models

TL;DR

The paper introduces the Falcon series of open-source large language models, highlighting their training process, performance, and the release of datasets and models to promote open science.

Contribution

It presents the Falcon models, including the largest 180B parameter model trained on 3.5 trillion tokens, with detailed training methods and open datasets to advance open-source LLM development.

Findings

01

Falcon-180B outperforms models like PaLM and Chinchilla.

02

Falcon-180B approaches PaLM-2-Large performance at lower costs.

03

Open datasets and models are released to foster open science.

Abstract

We introduce the Falcon series: 7B, 40B, and 180B parameters causal decoder-only models trained on a diverse high-quality corpora predominantly assembled from web data. The largest model, Falcon-180B, has been trained on over 3.5 trillion tokens of text--the largest openly documented pretraining run. Falcon-180B significantly outperforms models such as PaLM or Chinchilla, and improves upon concurrently developed models such as LLaMA 2 or Inflection-1. It nears the performance of PaLM-2-Large at a reduced pretraining and inference cost, making it, to our knowledge, one of the three best language models in the world along with GPT-4 and PaLM-2-Large. We report detailed evaluations, as well as a deep dive into the methods and custom tooling employed to pretrain Falcon. Notably, we report on our custom distributed training codebase, allowing us to efficiently pretrain these models on up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings