Transformer-Transducers for Code-Switched Speech Recognition

Siddharth Dalmia; Yuzong Liu; Srikanth Ronanki; Katrin Kirchhoff

arXiv:2011.15023·cs.CL·February 16, 2021

Transformer-Transducers for Code-Switched Speech Recognition

Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff

PDF

TL;DR

This paper introduces a transformer-transducer based end-to-end speech recognition system specifically designed for code-switched speech, incorporating novel training strategies and model modifications to effectively handle multilingual intra-sentential switching.

Contribution

It presents three key innovations: auxiliary loss functions, a mask-based training strategy with language ID, and a multi-label encoder structure for improved code-switching recognition.

Findings

01

Achieved 18.5% and 26.3% error rates on Mandarin-English code-switching datasets.

02

Demonstrated effectiveness of proposed methods over baseline models.

03

Validated on SEAME dataset with significant improvements.

Abstract

We live in a world where 60% of the population can speak two or more languages fluently. Members of these communities constantly switch between languages when having a conversation. As automatic speech recognition (ASR) systems are being deployed to the real-world, there is a need for practical systems that can handle multiple languages both within an utterance or across utterances. In this paper, we present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition. We propose three modifications over the vanilla model in order to handle various aspects of code-switching. First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching. Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.