Non-autoregressive Mandarin-English Code-switching Speech Recognition
Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-yi Lee

TL;DR
This paper introduces a non-autoregressive speech recognition model for Mandarin-English code-switching, utilizing Pinyin targets and novel regularization techniques to improve recognition accuracy and training efficiency.
Contribution
It applies the Mask-CTC non-autoregressive framework to CS speech recognition and proposes new methods for faster training and improved contextual understanding.
Findings
Achieved promising results on the SEAME corpus.
Enhanced training speed with Pinyin-based encoder targets.
Improved recognition accuracy through novel regularization techniques.
Abstract
Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people. However, the intra-sentence language switching of the two very different languages makes recognizing CS speech challenging. Meanwhile, the recent successful non-autoregressive (NAR) ASR models remove the need for left-to-right beam decoding in autoregressive (AR) models and achieved outstanding performance and fast inference speed, but it has not been applied to Mandarin-English CS speech recognition. This paper takes advantage of the Mask-CTC NAR ASR framework to tackle the CS speech recognition issue. We further propose to change the Mandarin output target of the encoder to Pinyin for faster encoder training and introduce the Pinyin-to-Mandarin decoder to learn contextualized information. Moreover, we use word embedding label smoothing to regularize the decoder with contextualized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
MethodsLabel Smoothing
