AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR
Tuan Nguyen, Huy-Dat Tran

TL;DR
AsyncSwitch is a novel asynchronous adaptation framework that enhances code-switched ASR by leveraging large-scale web data for pre-exposure, significantly reducing word error rates and improving multilingual speech recognition performance.
Contribution
It introduces a three-stage asynchronous adaptation method that pre-exposes models to code-switched text, aligns encoder and decoder with limited speech data, and fine-tunes for improved multilingual ASR.
Findings
9.02% relative WER reduction on Malay-English code-switching
Improved monolingual performance in Singlish and Malay
Effective use of web data for multilingual ASR adaptation
Abstract
Developing code-switched ASR systems is challenging due to language ambiguity and limited exposure to multilingual, code-switched data, while collecting such speech is costly. Prior work generates synthetic audio from text, but these methods are computationally intensive and hard to scale. We introduce AsyncSwitch, a novel asynchronous adaptation framework that leverages large-scale, text-rich web data to pre-expose ASR models to diverse code-switched domains before fine-tuning on paired speech-text corpora. Our three-stage process (1) trains decoder self-attention and feedforward layers on code-switched text, (2) aligns decoder and encoder via cross-attention using limited speech-text data, and (3) fully fine-tunes the entire model. Experiments with Whisper on Malay-English code-switching demonstrate a 9.02% relative WER reduction, while improving monolingual performance in Singlish,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing
