Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

Qingzheng Wang; Jiancheng Sun; Yifan Peng; Shinji Watanabe

arXiv:2505.24200·cs.SD·June 4, 2025

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC

Qingzheng Wang, Jiancheng Sun, Yifan Peng, Shinji Watanabe

PDF

TL;DR

This paper improves multilingual speech models on ML-SUPERB 2.0 by combining data augmentation, LID-aware CTC, and various fine-tuning strategies, leading to significant performance gains in LID and ASR tasks.

Contribution

It introduces LID-aware CTC loss and explores multiple fine-tuning strategies with data augmentation to enhance multilingual speech model performance.

Findings

01

14% relative improvement in LID accuracy

02

30% relative reduction in ASR CER

03

Achieved second place in ML-SUPERB 2.0 Challenge

Abstract

Multilingual speech processing with self-supervised or supervised pre-trained Speech Foundation Models (SFM) has achieved strong performance on tasks like Language Identification (LID) and Automatic Speech Recognition (ASR). However, these models struggle with limited resources during fine-tuning. This paper enhances multilingual LID and ASR on ML-SUPERB 2.0 by exploring multiple strategies for adapting SFMs, including frozen upstream training, partial fine-tuning, and low-rank adaptation. Furthermore, we employ data augmentation to mitigate performance gaps in few-shot settings and introduce LID Connectionist Temporal Classification (CTC) loss for regularization. Our approach achieves a 14% relative improvement in LID accuracy and a 30% relative reduction in ASR CER over the baseline on ML-SUPERB 2.0, securing second place in the Interspeech 2025 ML-SUPERB 2.0 Challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.