Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition

Qijie Shao; Jinghao Yan; Jian Kang; Pengcheng Guo; Xian Shi; Pengfei; Hu; Lei Xie

arXiv:2204.03398·cs.SD·July 4, 2022

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition

Qijie Shao, Jinghao Yan, Jian Kang, Pengcheng Guo, Xian Shi, Pengfei, Hu, Lei Xie

PDF

Open Access

TL;DR

This paper introduces LASAS, a novel accent recognition method that estimates accent shifts using linguistic-acoustic similarity, improving accuracy by combining linguistic and acoustic features.

Contribution

The paper proposes a new accent shift estimation approach based on linguistic-acoustic similarity, enhancing accent recognition performance over traditional acoustic-only models.

Findings

01

Achieved 77.42% accuracy on AESRC dataset

02

Improved performance by 6.94% relative over previous systems

03

Effectively combines linguistic and acoustic features

Abstract

General accent recognition (AR) models tend to directly extract low-level information from spectrums, which always significantly overfit on speakers or channels. Considering accent can be regarded as a series of shifts relative to native pronunciation, distinguishing accents will be an easier task with accent shift as input. But due to the lack of native utterance as an anchor, estimating the accent shift is difficult. In this paper, we propose linguistic-acoustic similarity based accent shift (LASAS) for AR tasks. For an accent speech utterance, after mapping the corresponding text vector to multiple accent-associated spaces as anchors, its accent shift could be estimated by the similarities between the acoustic embedding and those anchors. Then, we concatenate the accent shift with a dimension-reduced text vector to obtain a linguistic-acoustic bimodal representation. Compared with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing