Frequency-centroid features for word recognition of non-native English   speakers

Pierre Berjon; Rajib Sharma; Avishek Nag; and Soumyabrata Dev

arXiv:2206.07176·cs.SD·June 16, 2022

Frequency-centroid features for word recognition of non-native English speakers

Pierre Berjon, Rajib Sharma, Avishek Nag, and Soumyabrata Dev

PDF

Open Access 1 Repo

TL;DR

This paper introduces frequency-centroid features that complement MFCCs to improve non-native English word recognition, especially in noisy environments, using a CNN model across different accents.

Contribution

It proposes frequency-centroid features derived from spectral centers, enhancing traditional MFCCs for better recognition of non-native English speech.

Findings

01

Frequency-centroid features improve recognition accuracy.

02

Combined features outperform MFCCs alone in noisy conditions.

03

Effective across Arabic, French, and Spanish accents.

Abstract

The objective of this work is to investigate complementary features which can aid the quintessential Mel frequency cepstral coefficients (MFCCs) in the task of closed, limited set word recognition for non-native English speakers of different mother-tongues. Unlike the MFCCs, which are derived from the spectral energy of the speech signal, the proposed frequency-centroids (FCs) encapsulate the spectral centres of the different bands of the speech spectrum, with the bands defined by the Mel filterbank. These features, in combination with the MFCCs, are observed to provide relative performance improvement in English word recognition, particularly under varied noisy conditions. A two-stage Convolution Neural Network (CNN) is used to model the features of the English words uttered with Arabic, French and Spanish accents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pberjon/frequency-centroid-features-for-word-recognition-of-non-native-english-speakers
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques

MethodsConvolution