Cross-lingual Matryoshka Representation Learning across Speech and Text

Yaya Sy; Dioula Doucour\'e; Christophe Cerisara; Irina Illina

arXiv:2602.19991·cs.CL·April 22, 2026

Cross-lingual Matryoshka Representation Learning across Speech and Text

Yaya Sy, Dioula Doucour\'e, Christophe Cerisara, Irina Illina

PDF

1 Models

TL;DR

This paper presents a bilingual speech-text embedding model for French-Wolof that enables efficient retrieval and generalizes to other tasks, with analysis of cost-accuracy trade-offs and modality fusion strategies.

Contribution

It introduces the first bilingual speech-text Matryoshka embedding model for under-represented languages, along with new benchmarks and data curation pipelines.

Findings

01

Modality fusion within a frozen text model performs best.

02

The model generalizes well to speech intent detection.

03

Information is concentrated in few components, enabling efficiency improvements.

Abstract

Speakers of under-represented languages face both a language barrier, as most online knowledge is in a few dominant languages, and a modality barrier, since information is largely text-based while many languages are primarily oral. We address this for French-Wolof by training the first bilingual speech-text Matryoshka embedding model, enabling efficient retrieval of French text from Wolof speech queries without relying on a costly ASR-translation pipelines. We introduce large-scale data curation pipelines and new benchmarks, compare modeling strategies, and show that modality fusion within a frozen text Matryoshka model performs best. Although trained only for retrieval, the model generalizes well to other tasks, such as speech intent detection, indicating the learning of general semantic representations. Finally, we analyze cost-accuracy trade-offs across Matryoshka dimensions and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
soynade-research/Oolel-Embed
model· 7 dl· ♡ 2
7 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.