Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation   over More Languages and Beyond

Jiatong Shi; William Chen; Dan Berrebbi; Hsiu-Hsuan Wang; Wei-Ping; Huang; En-Pei Hu; Ho-Lam Chuang; Xuankai Chang; Yuxun Tang; Shang-Wen Li,; Abdelrahman Mohamed; Hung-yi Lee; Shinji Watanabe

arXiv:2310.05513·cs.SD·February 25, 2025

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping, Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li,, Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe

PDF

Open Access

TL;DR

The 2023 ML-SUPERB Challenge expanded multilingual speech recognition benchmarks to include more languages and resources, revealing that scaling models alone is insufficient and diverse speech types pose significant challenges.

Contribution

This paper introduces a comprehensive multilingual speech benchmark with new tracks and extensive language coverage, advancing evaluation of self-supervised models in multilingual speech tasks.

Findings

01

Scaling models alone is not enough for multilingual speech recognition.

02

Diverse speech and voice types significantly challenge multilingual models.

03

The benchmark includes 154 languages and 54 corpora, providing a broad evaluation framework.

Abstract

The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track where language resource researchers can contribute and evaluate their low-resource language data in the context of the latest progress in multilingual speech recognition. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks, and a variety of speech/voice types present significant challenges in multilingual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques