Mistral 7B

Albert Q. Jiang; Alexandre Sablayrolles; Arthur Mensch; Chris Bamford,; Devendra Singh Chaplot; Diego de las Casas; Florian Bressand; Gianna Lengyel,; Guillaume Lample; Lucile Saulnier; L\'elio Renard Lavaud; Marie-Anne Lachaux,; Pierre Stock; Teven Le Scao; Thibaut Lavril; Thomas Wang; Timoth\'ee Lacroix,; William El Sayed

arXiv:2310.06825·cs.CL·October 11, 2023·278 cites

Mistral 7B

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,, Guillaume Lample, Lucile Saulnier, L\'elio Renard Lavaud, Marie-Anne Lachaux,, Pierre Stock, Teven Le Scao, Thibaut Lavril

PDF

Open Access 5 Repos 10 Models 5 Datasets

TL;DR

Mistral 7B is a new 7-billion-parameter language model that outperforms larger models in various benchmarks through innovative attention mechanisms and instruction tuning, offering efficient inference and high performance.

Contribution

Introduces Mistral 7B with grouped-query and sliding window attention, achieving superior performance and efficiency over larger models, and provides an instruction-tuned variant surpassing comparable models.

Findings

01

Outperforms Llama 2 13B across all benchmarks

02

Surpasses Llama 1 34B in reasoning, math, and code generation

03

Efficient inference with GQA and SWA mechanisms

Abstract

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification

MethodsAttention Is All You Need · Dense Connections · Softmax · Feedforward Network · Grouped-query attention