A Variational Prosody Model for Mapping the Context-Sensitive Variation   of Functional Prosodic Prototypes

Branislav Gerazov; G\'erard Bailly; Omar Mohammed; Yi Xu; and Philip; N. Garner

arXiv:1806.08685·eess.AS·March 19, 2019·5 cites

A Variational Prosody Model for Mapping the Context-Sensitive Variation of Functional Prosodic Prototypes

Branislav Gerazov, G\'erard Bailly, Omar Mohammed, Yi Xu, and Philip, N. Garner

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Variational Prosody Model (VPM) that combines generative prosody modeling with deep learning to interpret and generate context-sensitive intonation variations, leveraging big data without specialized corpora.

Contribution

It proposes a novel VPM based on superposition of functional contours, enabling interpretable, context-sensitive prosody modeling with meaningful latent space representations.

Findings

01

VPM captures intrinsic variability of prosodic prototypes.

02

The model effectively represents multi-dimensional context variability.

03

VPM generates natural, dynamic prosody contours in synthesis.

Abstract

The quest for comprehensive generative models of intonation that link linguistic and paralinguistic functions to prosodic forms has been a longstanding challenge of speech communication research. Traditional intonation models have given way to the overwhelming performance of deep learning (DL) techniques for training general purpose end-to-end mappings using millions of tunable parameters. The shift towards black box machine learning models has nonetheless posed the reverse problem -- a compelling need to discover knowledge, to explain, visualise and interpret. Our work bridges between a comprehensive generative model of intonation and state-of-the-art DL techniques. We build upon the modelling paradigm of the Superposition of Functional Contours (SFC) model and propose a Variational Prosody Model (VPM) that uses a network of variational contour generators to capture the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gerazov/prosodeep
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling