Frequency-centroid features for word recognition of non-native English speakers
Pierre Berjon, Rajib Sharma, Avishek Nag, and Soumyabrata Dev

TL;DR
This paper introduces frequency-centroid features that complement MFCCs to improve non-native English word recognition, especially in noisy environments, using a CNN model across different accents.
Contribution
It proposes frequency-centroid features derived from spectral centers, enhancing traditional MFCCs for better recognition of non-native English speech.
Findings
Frequency-centroid features improve recognition accuracy.
Combined features outperform MFCCs alone in noisy conditions.
Effective across Arabic, French, and Spanish accents.
Abstract
The objective of this work is to investigate complementary features which can aid the quintessential Mel frequency cepstral coefficients (MFCCs) in the task of closed, limited set word recognition for non-native English speakers of different mother-tongues. Unlike the MFCCs, which are derived from the spectral energy of the speech signal, the proposed frequency-centroids (FCs) encapsulate the spectral centres of the different bands of the speech spectrum, with the bands defined by the Mel filterbank. These features, in combination with the MFCCs, are observed to provide relative performance improvement in English word recognition, particularly under varied noisy conditions. A two-stage Convolution Neural Network (CNN) is used to model the features of the English words uttered with Arabic, French and Spanish accents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
MethodsConvolution
