Using heterogeneity in semi-supervised transcription hypotheses to   improve code-switched speech recognition

Andrew Slottje; Shannon Wotherspoon; William Hartmann; Matthew Snover,; Owen Kimball

arXiv:2106.07699·cs.CL·June 16, 2021

Using heterogeneity in semi-supervised transcription hypotheses to improve code-switched speech recognition

Andrew Slottje, Shannon Wotherspoon, William Hartmann, Matthew Snover,, Owen Kimball

PDF

Open Access

TL;DR

This paper introduces a semi-supervised method for code-switched speech recognition that leverages multiple biased transcription models to improve performance, especially when monolingual data is asymmetrically matched to the languages involved.

Contribution

It proposes a novel semi-supervised approach that combines biased transcription models to enhance code-switched ASR accuracy, addressing data asymmetry issues.

Findings

01

Achieved 19% relative improvement over single-model semi-supervised systems.

02

Demonstrated effectiveness on English-Mandarin code-switching data.

03

Showed that combining biased models outperforms using only the best-matched monolingual data.

Abstract

Modeling code-switched speech is an important problem in automatic speech recognition (ASR). Labeled code-switched data are rare, so monolingual data are often used to model code-switched speech. These monolingual data may be more closely matched to one of the languages in the code-switch pair. We show that such asymmetry can bias prediction toward the better-matched language and degrade overall model performance. To address this issue, we propose a semi-supervised approach for code-switched ASR. We consider the case of English-Mandarin code-switching, and the problem of using monolingual data to build bilingual "transcription models'' for annotation of unlabeled code-switched data. We first build multiple transcription models so that their individual predictions are variously biased toward either English or Mandarin. We then combine these biased transcriptions using confidence-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research