ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World

Ziyin Zhang; Zihan Liao; Hang Yu; Peng Di; Rui Wang

arXiv:2605.15081·cs.CL·May 15, 2026

ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World

Ziyin Zhang, Zihan Liao, Hang Yu, Peng Di, Rui Wang

PDF

1 Repo

TL;DR

ML-Embed introduces a suite of inclusive, efficient multilingual embedding models built on a novel 3D-ML framework, addressing computational costs, linguistic diversity, and transparency issues in text embeddings.

Contribution

The paper presents ML-Embed and the 3D-ML framework, combining efficiency, multilingual coverage, and transparency, with models and data openly released for reproducibility.

Findings

01

Models set new records on 9 of 17 MTEB benchmarks.

02

Strong performance in low-resource languages.

03

Efficient across the entire model lifecycle.

Abstract

The development of high-quality text embeddings is increasingly drifting toward an exclusionary future, defined by three critical barriers: prohibitive computational costs, a narrow linguistic focus that neglects most of the world's languages, and a lack of transparency from closed-source or open-weight models that stifles research. To dismantle these barriers, we introduce ML-Embed, a suite of inclusive and efficient models built upon a new framework: 3-Dimensional Matryoshka Learning (3D-ML). Our framework addresses the computational challenge with comprehensive efficiency across the entire model lifecycle. Beyond the storage benefits of Matryoshka Representation Learning (MRL) and flexible inference-time depth provided by Matryoshka Layer Learning (MLL), we introduce Matryoshka Embedding Learning (MEL) for enhanced parameter efficiency. To address the linguistic challenge, we curate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.