Multilingual Approach to Joint Speech and Accent Recognition with   DNN-HMM Framework

Yizhou Peng; Jicheng Zhang; Haobo Zhang; Haihua Xu; Hao Huang; Eng; Siong Chng

arXiv:2010.11483·eess.AS·May 11, 2021·1 cites

Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework

Yizhou Peng, Jicheng Zhang, Haobo Zhang, Haihua Xu, Hao Huang, Eng, Siong Chng

PDF

Open Access

TL;DR

This paper introduces a multilingual DNN-HMM framework for joint speech and accent recognition, treating accents as different languages, achieving competitive accuracy in recognizing English speech and accents simultaneously.

Contribution

The paper presents a novel multilingual approach to joint speech and accent recognition using DNN-HMM, with experiments on 8 accents demonstrating effective performance.

Findings

01

Achieved WERs close to conventional ASR systems ignoring accents

02

Realized word-based and utterance-based accent recognition

03

Provided extensive analysis on transfer learning and accent confusion

Abstract

Human can recognize speech, as well as the peculiar accent of the speech simultaneously. However, present state-of-the-art ASR system can rarely do that. In this paper, we propose a multilingual approach to recognizing English speech, and related accent that speaker conveys using DNN-HMM framework. Specifically, we assume different accents of English as different languages. We then merge them together and train a multilingual ASR system. During decoding, we conduct two experiments. One is a monolingual ASR-based decoding, with the accent information embedded at phone level, realizing word-based accent recognition (AR), and the other is a multilingual ASR-based decoding, realizing an approximated utterance-based AR. Experimental results on an 8-accent English speech recognition show both methods can yield WERs close to the conventional ASR systems that completely ignore the accent, as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing