Multilingual E5 Text Embeddings: A Technical Report
Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder,, Furu Wei

TL;DR
This report details the development and evaluation of open-source multilingual E5 text embedding models, highlighting their training process, different sizes, and competitive performance with state-of-the-art models.
Contribution
It introduces a new multilingual E5 embedding model family with multiple sizes and an instruction-tuned variant, based on contrastive pre-training and fine-tuning.
Findings
Models achieve competitive performance with state-of-the-art English models
Three sizes offer a trade-off between efficiency and quality
Open-source release facilitates broader adoption and research
Abstract
This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes. Information regarding the model release can be found at https://github.com/microsoft/unilm/tree/master/e5 .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗intfloat/multilingual-e5-largemodel· 4.5M dl· ♡ 11664.5M dl♡ 1166
- 🤗intfloat/multilingual-e5-smallmodel· 3.9M dl· ♡ 2963.9M dl♡ 296
- 🤗intfloat/multilingual-e5-basemodel· 2.5M dl· ♡ 3442.5M dl♡ 344
- 🤗intfloat/multilingual-e5-large-instructmodel· 1.3M dl· ♡ 6091.3M dl♡ 609
- 🤗dragonkue/multilingual-e5-small-ko-v2model· 4.2k dl· ♡ 34.2k dl♡ 3
- 🤗aisingapore/SEA-LION-E5-Embedding-600Mmodel· 77 dl· ♡ 177 dl♡ 1
- 🤗Nehc/e5-large-rumodel· 103 dl· ♡ 1103 dl♡ 1
- 🤗beademiguelperez/sentence-transformers-multilingual-e5-smallmodel· 4.1k dl· ♡ 14.1k dl♡ 1
- 🤗ildodeltaRule/multilingual-e5-largemodel· 39 dl39 dl
- 🤗Avditvs/multilingual-e5-small-distill-base-0.1model· 3 dl· ♡ 53 dl♡ 5
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
MethodsBalanced Selection
