Advancing Multi-Accented LSTM-CTC Speech Recognition using a Domain   Specific Student-Teacher Learning Paradigm

Shahram Ghorbani; Ahmet E. Bulut; John H.L. Hansen

arXiv:1809.06833·eess.AS·October 3, 2019·SLT·1 cites

Advancing Multi-Accented LSTM-CTC Speech Recognition using a Domain Specific Student-Teacher Learning Paradigm

Shahram Ghorbani, Ahmet E. Bulut, John H.L. Hansen

PDF

Open Access

TL;DR

This paper introduces a domain-specific student-teacher learning paradigm for multi-accent speech recognition using LSTM-CTC models, significantly improving accuracy across diverse accents by leveraging aligned accent-specific teachers and knowledge distillation.

Contribution

It proposes a novel multi-accent learning framework with aligned accent-specific teacher models and a student model, achieving substantial CER reduction and effective accent adaptation.

Findings

01

20.1% relative CER reduction with the proposed method

02

Aligned accent-specific models improve recognition accuracy

03

Knowledge distillation enhances accent adaptation performance

Abstract

Non-native speech causes automatic speech recognition systems to degrade in performance. Past strategies to address this challenge have considered model adaptation, accent classification with a model selection, alternate pronunciation lexicon, etc. In this study, we consider a recurrent neural network (RNN) with connectionist temporal classification (CTC) cost function trained on multi-accent English data including US (Native), Indian and Hispanic accents. We exploit dark knowledge from a model trained with the multi-accent data to train student models under the guidance of both a teacher model and CTC cost of target transcription. We show that transferring knowledge from a single RNN-CTC trained model toward a student model, yields better performance than the stand-alone teacher model. Since the outputs of different trained CTC models are not necessarily aligned, it is not possible to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing