A Variational Prosody Model for Mapping the Context-Sensitive Variation of Functional Prosodic Prototypes
Branislav Gerazov, G\'erard Bailly, Omar Mohammed, Yi Xu, and Philip, N. Garner

TL;DR
This paper introduces a Variational Prosody Model (VPM) that combines generative prosody modeling with deep learning to interpret and generate context-sensitive intonation variations, leveraging big data without specialized corpora.
Contribution
It proposes a novel VPM based on superposition of functional contours, enabling interpretable, context-sensitive prosody modeling with meaningful latent space representations.
Findings
VPM captures intrinsic variability of prosodic prototypes.
The model effectively represents multi-dimensional context variability.
VPM generates natural, dynamic prosody contours in synthesis.
Abstract
The quest for comprehensive generative models of intonation that link linguistic and paralinguistic functions to prosodic forms has been a longstanding challenge of speech communication research. Traditional intonation models have given way to the overwhelming performance of deep learning (DL) techniques for training general purpose end-to-end mappings using millions of tunable parameters. The shift towards black box machine learning models has nonetheless posed the reverse problem -- a compelling need to discover knowledge, to explain, visualise and interpret. Our work bridges between a comprehensive generative model of intonation and state-of-the-art DL techniques. We build upon the modelling paradigm of the Superposition of Functional Contours (SFC) model and propose a Variational Prosody Model (VPM) that uses a network of variational contour generators to capture the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling
