Adapting OpenAI's Whisper for Speech Recognition on Code-Switch   Mandarin-English SEAME and ASRU2019 Datasets

Yuhang Yang; Yizhou Peng; Xionghu Zhong; Hao Huang; Eng Siong Chng

arXiv:2311.17382·eess.AS·November 30, 2023·APSIPA·1 cites

Adapting OpenAI's Whisper for Speech Recognition on Code-Switch Mandarin-English SEAME and ASRU2019 Datasets

Yuhang Yang, Yizhou Peng, Xionghu Zhong, Hao Huang, Eng Siong Chng

PDF

Open Access

TL;DR

This study explores adapting OpenAI's Whisper model for Mandarin-English code-switch speech recognition, demonstrating that minimal adaptation data can significantly improve performance across different datasets and prompting strategies.

Contribution

It provides empirical evidence on effective adaptation of Whisper with limited data and various prompts for code-switch speech recognition.

Findings

01

As little as 1-10 hours of adaptation data can saturate performance on SEAME.

02

More than 100 hours of data improve results on ASRU2019.

03

Adapting Whisper with code-switch data consistently enhances recognition accuracy.

Abstract

This paper details the experimental results of adapting the OpenAI's Whisper model for Code-Switch Mandarin-English Speech Recognition (ASR) on the SEAME and ASRU2019 corpora. We conducted 2 experiments: a) using adaptation data from 1 to 100/200 hours to demonstrate effectiveness of adaptation, b) examining different language ID setup on Whisper prompt. The Mixed Error Rate results show that the amount of adaptation data may be as low as $1 \sim 10$ hours to achieve saturation in performance gain (SEAME) while the ASRU task continued to show performance with more adaptation data ( $>$ 100 hours). For the language prompt, the results show that although various prompting strategies initially produce different outcomes, adapting the Whisper model with code-switch data uniformly improves its performance. These results may be relevant also to the community when applying Whisper for related…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing