AISPEECH-SJTU accent identification system for the Accented English   Speech Recognition Challenge

Houjun Huang; Xu Xiang; Yexin Yang; Rao Ma; Yanmin Qian

arXiv:2102.09828·cs.SD·February 22, 2021·6 cites

AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

Houjun Huang, Xu Xiang, Yexin Yang, Rao Ma, Yanmin Qian

PDF

Open Access

TL;DR

This paper presents a robust accent identification system for accented English speech, utilizing novel features, data augmentation, and fusion techniques, achieving top performance in a challenging recognition competition.

Contribution

The paper introduces a TTS-based data augmentation method and test-time augmentation strategies for accent identification, significantly improving accuracy over existing approaches.

Findings

01

Achieved 83.63% accuracy on challenge data

02

Outperformed all competitors by over 10%

03

Validated effectiveness of PPG features and augmentation methods

Abstract

This paper describes the AISpeech-SJTU system for the accent identification track of the Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented English data collected from 8 countries and the auxiliary Librispeech dataset are provided for training. To build an accurate and robust accent identification system, we explore the whole system pipeline in detail. First, we introduce the ASR based phone posteriorgram (PPG) feature to accent identification and verify its efficacy. Then, a novel TTS based approach is carefully designed to augment the very limited accent training data for the first time. Finally, we propose the test time augmentation and embedding fusion schemes to further improve the system performance. Our final system is ranked first in the challenge and outperforms all the other participants by a large margin. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing