Pronunciation recognition of English phonemes /\textipa{@}/, /{\ae}/, /\textipa{A}:/ and /\textipa{2}/ using Formants and Mel Frequency Cepstral Coefficients
Keith Y. Patarroyo, Vladimir Vargas-Calder\'on

TL;DR
This study compares formant and MFCC-based speech recognition methods for distinguishing similar English vowels, achieving moderate accuracy and highlighting the challenges in replicating human vowel discrimination.
Contribution
It introduces a quantitative comparison of formant and MFCC features for vowel recognition, demonstrating their limitations in differentiating similar phonemes.
Findings
Formant analysis achieved 70% accuracy.
MFCC analysis achieved 52% accuracy.
Ignoring /@/ increased accuracy to 71%.
Abstract
The Vocal Joystick Vowel Corpus, by Washington University, was used to study monophthongs pronounced by native English speakers. The objective of this study was to quantitatively measure the extent at which speech recognition methods can distinguish between similar sounding vowels. In particular, the phonemes /\textipa{@}/, /{\ae}/, /\textipa{A}:/ and /\textipa{2}/ were analysed. 748 sound files from the corpus were used and subjected to Linear Predictive Coding (LPC) to compute their formants, and to Mel Frequency Cepstral Coefficients (MFCC) algorithm, to compute the cepstral coefficients. A Decision Tree Classifier was used to build a predictive model that learnt the patterns of the two first formants measured in the data set, as well as the patterns of the 13 cepstral coefficients. An accuracy of 70\% was achieved using formants for the mentioned phonemes. For the MFCC analysis an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
