Using heterogeneity in semi-supervised transcription hypotheses to improve code-switched speech recognition
Andrew Slottje, Shannon Wotherspoon, William Hartmann, Matthew Snover,, Owen Kimball

TL;DR
This paper introduces a semi-supervised method for code-switched speech recognition that leverages multiple biased transcription models to improve performance, especially when monolingual data is asymmetrically matched to the languages involved.
Contribution
It proposes a novel semi-supervised approach that combines biased transcription models to enhance code-switched ASR accuracy, addressing data asymmetry issues.
Findings
Achieved 19% relative improvement over single-model semi-supervised systems.
Demonstrated effectiveness on English-Mandarin code-switching data.
Showed that combining biased models outperforms using only the best-matched monolingual data.
Abstract
Modeling code-switched speech is an important problem in automatic speech recognition (ASR). Labeled code-switched data are rare, so monolingual data are often used to model code-switched speech. These monolingual data may be more closely matched to one of the languages in the code-switch pair. We show that such asymmetry can bias prediction toward the better-matched language and degrade overall model performance. To address this issue, we propose a semi-supervised approach for code-switched ASR. We consider the case of English-Mandarin code-switching, and the problem of using monolingual data to build bilingual "transcription models'' for annotation of unlabeled code-switched data. We first build multiple transcription models so that their individual predictions are variously biased toward either English or Mandarin. We then combine these biased transcriptions using confidence-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research
