The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
Kenneth Enevoldsen, M\'arton Kardos, Niklas Muennighoff, Kristoffer, Laigaard Nielbo

TL;DR
This paper introduces the Scandinavian Embedding Benchmark (SEB), a comprehensive evaluation framework for Scandinavian language text embeddings across multiple tasks, revealing performance gaps and integrating with existing benchmarks.
Contribution
The paper presents SEB, the first extensive benchmark for Scandinavian language embeddings, and demonstrates its integration with MTEB to improve multilingual evaluation.
Findings
Significant performance disparities between public and commercial models.
SEB covers 24 tasks across 4 categories for Scandinavian languages.
Open-sourced SEB and integrated it with MTEB.
Abstract
The evaluation of English text embeddings has transitioned from evaluating a handful of datasets to broad coverage across many tasks through benchmarks such as MTEB. However, this is not the case for multilingual text embeddings due to a lack of available benchmarks. To address this problem, we introduce the Scandinavian Embedding Benchmark (SEB). SEB is a comprehensive framework that enables text embedding evaluation for Scandinavian languages across 24 tasks, 10 subtasks, and 4 task categories. Building on SEB, we evaluate more than 26 models, uncovering significant performance disparities between public and commercial solutions not previously captured by MTEB. We open-source SEB and integrate it with MTEB, thus bridging the text embedding evaluation gap for Scandinavian languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
Topicslinguistics and terminology studies
