Pronunciation recognition of English phonemes /\textipa{@}/, /{\ae}/,   /\textipa{A}:/ and /\textipa{2}/ using Formants and Mel Frequency Cepstral   Coefficients

Keith Y. Patarroyo; Vladimir Vargas-Calder\'on

arXiv:1702.07071·cs.CL·February 24, 2017·1 cites

Pronunciation recognition of English phonemes /\textipa{@}/, /{\ae}/, /\textipa{A}:/ and /\textipa{2}/ using Formants and Mel Frequency Cepstral Coefficients

Keith Y. Patarroyo, Vladimir Vargas-Calder\'on

PDF

Open Access

TL;DR

This study compares formant and MFCC-based speech recognition methods for distinguishing similar English vowels, achieving moderate accuracy and highlighting the challenges in replicating human vowel discrimination.

Contribution

It introduces a quantitative comparison of formant and MFCC features for vowel recognition, demonstrating their limitations in differentiating similar phonemes.

Findings

01

Formant analysis achieved 70% accuracy.

02

MFCC analysis achieved 52% accuracy.

03

Ignoring /@/ increased accuracy to 71%.

Abstract

The Vocal Joystick Vowel Corpus, by Washington University, was used to study monophthongs pronounced by native English speakers. The objective of this study was to quantitatively measure the extent at which speech recognition methods can distinguish between similar sounding vowels. In particular, the phonemes /\textipa{@}/, /{\ae}/, /\textipa{A}:/ and /\textipa{2}/ were analysed. 748 sound files from the corpus were used and subjected to Linear Predictive Coding (LPC) to compute their formants, and to Mel Frequency Cepstral Coefficients (MFCC) algorithm, to compute the cepstral coefficients. A Decision Tree Classifier was used to build a predictive model that learnt the patterns of the two first formants measured in the data set, as well as the patterns of the 13 cepstral coefficients. An accuracy of 70\% was achieved using formants for the mentioned phonemes. For the MFCC analysis an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research