On the Cross-lingual Transferability of Pre-trained wav2vec2-based Models

Jonatas Grosman; Cassio Almeida; Guilherme Schardong; H\'elio Lopes

arXiv:2511.21704·cs.CL·December 1, 2025

On the Cross-lingual Transferability of Pre-trained wav2vec2-based Models

Jonatas Grosman, Cassio Almeida, Guilherme Schardong, H\'elio Lopes

PDF

Open Access

TL;DR

This study investigates how well wav2vec 2.0-based models transfer knowledge across different languages, revealing that data diversity and language similarity influence cross-lingual speech recognition performance.

Contribution

It provides a comprehensive analysis of cross-lingual transferability of wav2vec 2.0 models across 18 languages, highlighting the importance of data diversity and language similarity.

Findings

01

Performance is more affected by data diversity than data size during pre-training.

02

Indo-European languages outperform non-Indo-European languages in transfer tasks.

03

Positive transfer observed across all languages, especially when pre-training language is similar to target language.

Abstract

Using representations provided by a large pre-trained model has become the primary strategy for achieving state-of-the-art results in a wide range of tasks. A recently proposed large pre-trained model, wav2vec 2.0, was seminal for several other works on pre-training large models on speech data. Many models are being pre-trained using the same architecture as wav2vec 2.0 and are getting state-of-the-art in various speech-related tasks. Previous work has demonstrated that the data used during the pre-training of these wav2vec2-based models can impact the model's performance in downstream tasks, and this should be taken into consideration before utilizing these models. However, few works have proposed investigating further how the transfer knowledge of these pre-trained models behaves in different languages, even when the target language differs from the one used during the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Language and cultural evolution · Topic Modeling