Accent Recognition with Hybrid Phonetic Features

Zhan Zhang; Xi Chen; Yuehai Wang; Jianyi Yang

arXiv:2105.01920·eess.AS·May 6, 2021

Accent Recognition with Hybrid Phonetic Features

Zhan Zhang, Xi Chen, Yuehai Wang, Jianyi Yang

PDF

Open Access

TL;DR

This paper presents a hybrid phonetic feature approach using auxiliary ASR tasks and combined acoustic model embeddings to improve accent recognition accuracy, achieving significant performance gains on the AESRC 2020 dataset.

Contribution

It introduces a novel hybrid structure that integrates fixed and trainable acoustic model embeddings for robust accent recognition using phonetic features.

Findings

01

6.57% relative improvement on validation set

02

7.28% relative improvement on test set

03

Enhanced robustness of accent recognition system

Abstract

The performance of voice-controlled systems is usually influenced by accented speech. To make these systems more robust, the frontend accent recognition (AR) technologies have received increased attention in recent years. As accent is a high-level abstract feature that has a profound relationship with the language knowledge, AR is more challenging than other language-agnostic audio classification tasks. In this paper, we use an auxiliary automatic speech recognition (ASR) task to extract language-related phonetic features. Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust. We conduct several experiments on the Accented English Speech Recognition Challenge (AESRC) 2020 dataset. The results demonstrate that our approach can obtain a 6.57%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing