Transformer-Transducers for Code-Switched Speech Recognition
Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff

TL;DR
This paper introduces a transformer-transducer based end-to-end speech recognition system specifically designed for code-switched speech, incorporating novel training strategies and model modifications to effectively handle multilingual intra-sentential switching.
Contribution
It presents three key innovations: auxiliary loss functions, a mask-based training strategy with language ID, and a multi-label encoder structure for improved code-switching recognition.
Findings
Achieved 18.5% and 26.3% error rates on Mandarin-English code-switching datasets.
Demonstrated effectiveness of proposed methods over baseline models.
Validated on SEAME dataset with significant improvements.
Abstract
We live in a world where 60% of the population can speak two or more languages fluently. Members of these communities constantly switch between languages when having a conversation. As automatic speech recognition (ASR) systems are being deployed to the real-world, there is a need for practical systems that can handle multiple languages both within an utterance or across utterances. In this paper, we present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition. We propose three modifications over the vanilla model in order to handle various aspects of code-switching. First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching. Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
