Multi-Accent Adaptation based on Gate Mechanism

Han Zhu; Li Wang; Pengyuan Zhang; Yonghong Yan

arXiv:2011.02774·eess.AS·November 6, 2020·1 cites

Multi-Accent Adaptation based on Gate Mechanism

Han Zhu, Li Wang, Pengyuan Zhang, Yonghong Yan

PDF

Open Access

TL;DR

This paper introduces a multi-accent speech recognition method using a gate mechanism that adapts models to multiple accents simultaneously, improving recognition accuracy especially when accent labels are unavailable at inference.

Contribution

The paper proposes the AST-G and MTL-G models for multi-accent adaptation, with AST-G using a gate mechanism for simultaneous adaptation and MTL-G integrating an accent classifier through multi-task learning.

Findings

01

AST-G achieves 9.8% and 1.9% relative WER reduction over baseline and accent-specific models.

02

MTL-G achieves 5.1% relative WER reduction over baseline, despite less accuracy than accent-specific adaptation.

03

Using a gate mechanism effectively improves multi-accent speech recognition performance.

Abstract

When only a limited amount of accented speech data is available, to promote multi-accent speech recognition performance, the conventional approach is accent-specific adaptation, which adapts the baseline model to multiple target accents independently. To simplify the adaptation procedure, we explore adapting the baseline model to multiple target accents simultaneously with multi-accent mixed data. Thus, we propose using accent-specific top layer with gate mechanism (AST-G) to realize multi-accent adaptation. Compared with the baseline model and accent-specific adaptation, AST-G achieves 9.8% and 1.9% average relative WER reduction respectively. However, in real-world applications, we can't obtain the accent category label for inference in advance. Therefore, we apply using an accent classifier to predict the accent label. To jointly train the acoustic model and the accent classifier, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders