Mistral 7B
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,, Guillaume Lample, Lucile Saulnier, L\'elio Renard Lavaud, Marie-Anne Lachaux,, Pierre Stock, Teven Le Scao, Thibaut Lavril

TL;DR
Mistral 7B is a new 7-billion-parameter language model that outperforms larger models in various benchmarks through innovative attention mechanisms and instruction tuning, offering efficient inference and high performance.
Contribution
Introduces Mistral 7B with grouped-query and sliding window attention, achieving superior performance and efficiency over larger models, and provides an instruction-tuned variant surpassing comparable models.
Findings
Outperforms Llama 2 13B across all benchmarks
Surpasses Llama 1 34B in reasoning, math, and code generation
Efficient inference with GQA and SWA mechanisms
Abstract
We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗mistralai/Mistral-7B-Instruct-v0.2model· 2.6M dl· ♡ 31042.6M dl♡ 3104
- 🤗mistralai/Mistral-7B-v0.1model· 450k dl· ♡ 4059450k dl♡ 4059
- 🤗mistralai/Mistral-7B-Instruct-v0.1model· 361k dl· ♡ 1828361k dl♡ 1828
- 🤗dfurman/Mistral-7B-Instruct-v0.1model· 27 dl· ♡ 227 dl♡ 2
- 🤗CATIE-AQ/mistral7B-FR-InstructNLP-LoRAmodel· 12 dl· ♡ 312 dl♡ 3
- 🤗Trelis/Mistral-7B-Instruct-v0.1-function-calling-adapters-v2model
- 🤗Trelis/Mistral-7B-Instruct-v0.1-function-calling-v2model· ♡ 33♡ 33
- 🤗uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7bmodel· 660 dl· ♡ 17660 dl♡ 17
- 🤗uukuguy/speechless-mistral-six-in-one-7bmodel· 675 dl· ♡ 4675 dl♡ 4
- 🤗asfxxx/MModelmodel· 17 dl17 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification
MethodsAttention Is All You Need · Dense Connections · Softmax · Feedforward Network · Grouped-query attention
