Recovering implicit pitch contours from formants in whispered speech
Pablo P\'erez Zarazaga, Zofia Malisz

TL;DR
This paper presents a machine learning approach to estimate implicit pitch contours from whispered speech formants, revealing prosodic information despite the absence of fundamental frequency.
Contribution
It introduces a two-step method combining denoising autoencoders and formant analysis to recover implicit pitch contours in whispering, a novel approach in whispered speech analysis.
Findings
Effective correlation between whispered and phonated formants established
Implicit pitch contours can be inferred from whispered formant data
Method demonstrates potential for prosody analysis in whispered speech
Abstract
Whispered speech is characterised by a noise-like excitation that results in the lack of fundamental frequency. Considering that prosodic phenomena such as intonation are perceived through f0 variation, the perception of whispered prosody is relatively difficult. At the same time, studies have shown that speakers do attempt to produce intonation when whispering and that prosodic variability is being transmitted, suggesting that intonation "survives" in whispered formant structure. In this paper, we aim to estimate the way in which formant contours correlate with an "implicit" pitch contour in whisper, using a machine learning model. We propose a two-step method: using a parallel corpus, we first transform the whispered formants into their phonated equivalents using a denoising autoencoder. We then analyse the formant contours to predict phonated pitch contour variation. We observe that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Speech and Audio Processing · Speech Recognition and Synthesis
