ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Jiatong Shi; Dan Berrebbi; William Chen; Ho-Lam Chung; En-Pei Hu; Wei; Ping Huang; Xuankai Chang; Shang-Wen Li; Abdelrahman Mohamed; Hung-yi Lee,; Shinji Watanabe

arXiv:2305.10615·cs.SD·February 25, 2025·1 cites

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei, Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee,, Shinji Watanabe

PDF

Open Access 2 Models 1 Datasets

TL;DR

ML-SUPERB extends the SUPERB benchmark to 143 languages, evaluating multilingual speech processing models and revealing insights into their performance and limitations across diverse languages.

Contribution

This paper introduces ML-SUPERB, a multilingual benchmark for speech SSL models covering 143 languages, enabling comprehensive evaluation beyond English.

Findings

01

Speech SSL models outperform FBANK features in multilingual tasks.

02

Multilingual models do not always outperform monolingual models.

03

ML-SUPERB will be released with datasets and scripts for future research.

Abstract

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic speech recognition and language identification. Following the concept of SUPERB, ML-SUPERB utilizes frozen SSL features and employs a simple framework for multilingual tasks by learning a shallow downstream model. Similar to the SUPERB benchmark, we find speech SSL models can significantly improve performance compared to FBANK features. Furthermore, we find that multilingual models do not always perform better than their monolingual counterparts. We will release ML-SUPERB as a challenge with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Bretagne/ml_superb_br
dataset· 10 dl
10 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing