Intrinsic normalization and extrinsic denormalization of formant data of vowels
T.V. Ananthapadmanabha, A.G. Ramakrishnan

TL;DR
This paper introduces a combined normalization and denormalization method for vowel formant data that reduces talker variability and improves vowel classification accuracy.
Contribution
It proposes a novel speaker-extrinsic re-scaling procedure that enhances existing normalization techniques for better vowel space representation.
Findings
Improved vowel classification accuracy over existing methods
Reduced talker-induced spread in vowel formant data
Effective combination of intrinsic and extrinsic normalization techniques
Abstract
Using a known speaker-intrinsic normalization procedure, formant data are scaled by the reciprocal of the geometric mean of the first three formant frequencies. This reduces the influence of the talker but results in a distorted vowel space. The proposed speaker-extrinsic procedure re-scales the normalized values by the mean formant values of vowels. When tested on the formant data of vowels published by Peterson and Barney, the combined approach leads to well separated clusters by reducing the spread due to talkers. The proposed procedure performs better than two top-ranked normalization procedures based on the accuracy of vowel classification as the objective measure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
