Evaluating Features and Metrics for High-Quality Simulation of Early Vocal Learning of Vowels
Branislav Gerazov, Daniel van Niekerk, Anqi Xu, Paul Konstantin Krug,, Peter Birkholz, and Yi Xu

TL;DR
This paper evaluates 40 feature-metric combinations to optimize vowel production in articulatory speech synthesis, aiming to improve understanding of early vocal learning despite acoustic mismatches.
Contribution
It systematically assesses feature-metric pairs for vowel synthesis quality and explores their relation to perceptual and acoustic error evaluation.
Findings
Formant error and feature-metric error surface projection are useful for evaluation.
Certain feature-metric combinations outperform others in vowel synthesis.
The approach provides insights into perceptual relevance of acoustic features.
Abstract
The way infants use auditory cues to learn to speak despite the acoustic mismatch of their vocal apparatus is a hot topic of scientific debate. The simulation of early vocal learning using articulatory speech synthesis offers a way towards gaining a deeper understanding of this process. One of the crucial parameters in these simulations is the choice of features and a metric to evaluate the acoustic error between the synthesised sound and the reference target. We contribute with evaluating the performance of a set of 40 feature-metric combinations for the task of optimising the production of static vowels with a high-quality articulatory synthesiser. Towards this end we assess the usability of formant error and the projection of the feature-metric error surface in the normalised F1-F2 formant space. We show that this approach can be used to evaluate the impact of features and metrics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Phonetics and Phonology Research
