Cross-Lingual Interleaving for Speech Language Models

Adel Moumen; Guangzhi Sun; Philip C. Woodland

arXiv:2512.01865·cs.CL·February 23, 2026

Cross-Lingual Interleaving for Speech Language Models

Adel Moumen, Guangzhi Sun, Philip C. Woodland

PDF

Open Access

TL;DR

This paper introduces a cross-lingual interleaving method for spoken language models that improves multilingual understanding and cross-lingual capabilities without textual supervision, supported by new datasets and benchmarks.

Contribution

The authors propose a novel cross-lingual interleaving technique for speech models and release new multilingual datasets and benchmarks for evaluation.

Findings

01

Interleaving improves monolingual semantic accuracy.

02

Enhances cross-lingual continuation robustness.

03

Strengthens cross-lingual hidden-state alignment.

Abstract

Spoken Language Models (SLMs) aim to learn linguistic competence directly from speech using discrete units, widening access to Natural Language Processing (NLP) technologies for languages with limited written resources. However, progress has been largely English-centric due to scarce spoken evaluation benchmarks and training data, making cross-lingual learning difficult. We present a cross-lingual interleaving method that mixes speech tokens across languages without textual supervision. We also release an EN-FR training dataset, TinyStories (~42k hours), together with EN-FR spoken StoryCloze and TopicCloze benchmarks for cross-lingual semantic evaluation, both synthetically generated using GPT-4. On 360M and 1B SLMs under matched training-token budgets, interleaving improves monolingual semantic accuracy, enables robust cross-lingual continuation, and strengthens cross-lingual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques