Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
Jiahui Zhao, Hao Shi, Chenrui Cui, Tianrui Wang, Hexin Liu, Zhaoheng, Ni, Lingxuan Ye, Longbiao Wang

TL;DR
This paper enhances the Whisper speech recognition model to better handle code-switching by refining encoding and introducing language-aware decoding, resulting in improved accuracy on multilingual speech datasets.
Contribution
It introduces encoder refinement and language-aware adapters with a fusion module to adapt Whisper for code-switching recognition, surpassing previous methods.
Findings
Achieves 4.1% and 7.2% relative MER reduction on test sets.
Significantly improves recognition of non-native language speech.
Outperforms state-of-the-art methods on CS-ASR tasks.
Abstract
Code-switching (CS) automatic speech recognition (ASR) faces challenges due to the language confusion resulting from accents, auditory similarity, and seamless language switches. Adaptation on the pre-trained multi-lingual model has shown promising performance for CS-ASR. In this paper, we adapt Whisper, which is a large-scale multilingual pre-trained speech recognition model, to CS from both encoder and decoder parts. First, we propose an encoder refiner to enhance the encoder's capacity of intra-sentence swithching. Second, we propose using two sets of language-aware adapters with different language prompt embeddings to achieve language-specific decoding information in each decoder layer. Then, a fusion module is added to fuse the language-aware decoding. The experimental results using the SEAME dataset show that, compared with the baseline model, the proposed approach achieves a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
