ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling   Constraints, Languages, and Datasets

Jiatong Shi; Shih-Heng Wang; William Chen; Martijn Bartelds; Vanya; Bannihatti Kumar; Jinchuan Tian; Xuankai Chang; Dan Jurafsky; Karen Livescu,; Hung-yi Lee; and Shinji Watanabe

arXiv:2406.08641·cs.SD·June 14, 2024

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Jiatong Shi, Shih-Heng Wang, William Chen, Martijn Bartelds, Vanya, Bannihatti Kumar, Jinchuan Tian, Xuankai Chang, Dan Jurafsky, Karen Livescu,, Hung-yi Lee, and Shinji Watanabe

PDF

Open Access

TL;DR

ML-SUPERB 2.0 introduces a comprehensive benchmark for evaluating multilingual speech models across various configurations, highlighting the importance of downstream model design and dataset-specific challenges in improving speech recognition performance.

Contribution

This work extends ML-SUPERB to include diverse downstream models and fine-tuning setups, providing a more versatile benchmark for multilingual speech model evaluation.

Findings

01

Performance varies significantly with downstream model design.

02

Large differences in performance across languages and datasets.

03

Targeted approaches are needed to enhance multilingual ASR.

Abstract

ML-SUPERB evaluates self-supervised learning (SSL) models on the tasks of language identification and automatic speech recognition (ASR). This benchmark treats the models as feature extractors and uses a single shallow downstream model, which can be fine-tuned for a downstream task. However, real-world use cases may require different configurations. This paper presents ML-SUPERB~2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models across downstream models, fine-tuning setups, and efficient model adaptation approaches. We find performance improvements over the setup of ML-SUPERB. However, performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches to improve multilingual ASR performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques