m3BERT: A Modern, Multi-lingual, Matryoshka Bidirectional Encoder

Yaoxiang Wang; Simiao Zuo; Qingguo Hu; Yucheng Ding; Yeyun Gong; Jian Jiao; Jinsong Su

arXiv:2605.19568·cs.CL·May 20, 2026

m3BERT: A Modern, Multi-lingual, Matryoshka Bidirectional Encoder

Yaoxiang Wang, Simiao Zuo, Qingguo Hu, Yucheng Ding, Yeyun Gong, Jian Jiao, Jinsong Su

PDF

TL;DR

m3BERT is a versatile, multi-lingual embedding model with a novel pretraining strategy that allows adaptation to various resource and accuracy constraints, outperforming existing models in industrial retrieval tasks.

Contribution

The paper introduces m3BERT, a multi-lingual, multi-granular transformer model with a new pretraining approach that enhances adaptability and performance in resource-constrained industrial retrieval systems.

Findings

01

m3BERT outperforms state-of-the-art models on Bing-Click dataset.

02

The model effectively adapts to diverse resource and accuracy requirements.

03

Multigranular pretraining improves generalization on public datasets.

Abstract

Embedding models are pivotal in industrial information retrieval systems like search and advertising. However, existing pretrained models often exhibit fixed architectures and embedding dimensionalities, posing significant challenges when adapting them to diverse deployment scenarios with varying business-driven constraints. A common practice involves fine-tuning with partial parameter initialization from larger pretrained models for resource-constrained tasks. This method is often suboptimal as the misalignment between pretraining and downstream usage prevents full realization of pretraining benefits. To address this limitation, we introduce m3BERT: a Modern, Multi-lingual, Matryoshka Bidirectional Encoder, which features a novel pretraining strategy that jointly optimizes representations across both transformer layers and multiple embedding dimensions. This enables a single model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.