Semi-supervised acoustic model training for speech with code-switching
Emre Y{\i}lmaz, Mitchell McLaren, Henk van den Heuvel, David A. van, Leeuwen

TL;DR
This paper explores semi-supervised training of acoustic models for Frisian-Dutch code-switching speech, leveraging automatic annotation techniques to improve recognition with limited manual data.
Contribution
It introduces methods for automatic language and speaker annotation to enhance semi-supervised acoustic model training for low-resource code-switching speech.
Findings
Automatic annotations improve recognition accuracy.
Language and speaker tagging enhance semi-supervised training.
Results demonstrate potential for low-resource bilingual ASR.
Abstract
In the FAME! project, we aim to develop an automatic speech recognition (ASR) system for Frisian-Dutch code-switching (CS) speech extracted from the archives of a local broadcaster with the ultimate goal of building a spoken document retrieval system. Unlike Dutch, Frisian is a low-resourced language with a very limited amount of manually annotated speech data. In this paper, we describe several automatic annotation approaches to enable using of a large amount of raw bilingual broadcast data for acoustic model training in a semi-supervised setting. Previously, it has been shown that the best-performing ASR system is obtained by two-stage multilingual deep neural network (DNN) training using 11 hours of manually annotated CS speech (reference) data together with speech data from other high-resourced languages. We compare the quality of transcriptions provided by this bilingual ASR system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
