Semi-supervised acoustic model training for speech with code-switching

Emre Y{\i}lmaz; Mitchell McLaren; Henk van den Heuvel; David A. van; Leeuwen

arXiv:1810.09699·cs.CL·October 24, 2018

Semi-supervised acoustic model training for speech with code-switching

Emre Y{\i}lmaz, Mitchell McLaren, Henk van den Heuvel, David A. van, Leeuwen

PDF

Open Access

TL;DR

This paper explores semi-supervised training of acoustic models for Frisian-Dutch code-switching speech, leveraging automatic annotation techniques to improve recognition with limited manual data.

Contribution

It introduces methods for automatic language and speaker annotation to enhance semi-supervised acoustic model training for low-resource code-switching speech.

Findings

01

Automatic annotations improve recognition accuracy.

02

Language and speaker tagging enhance semi-supervised training.

03

Results demonstrate potential for low-resource bilingual ASR.

Abstract

In the FAME! project, we aim to develop an automatic speech recognition (ASR) system for Frisian-Dutch code-switching (CS) speech extracted from the archives of a local broadcaster with the ultimate goal of building a spoken document retrieval system. Unlike Dutch, Frisian is a low-resourced language with a very limited amount of manually annotated speech data. In this paper, we describe several automatic annotation approaches to enable using of a large amount of raw bilingual broadcast data for acoustic model training in a semi-supervised setting. Previously, it has been shown that the best-performing ASR system is obtained by two-stage multilingual deep neural network (DNN) training using 11 hours of manually annotated CS speech (reference) data together with speech data from other high-resourced languages. We compare the quality of transcriptions provided by this bilingual ASR system…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing