Applying the Information Bottleneck Principle to Prosodic Representation   Learning

Guangyan Zhang; Ying Qin; Daxin Tan; Tan Lee

arXiv:2108.02821·eess.AS·August 9, 2021

Applying the Information Bottleneck Principle to Prosodic Representation Learning

Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee

PDF

Open Access

TL;DR

This paper introduces a neural speech generation model that applies the information bottleneck principle to learn controllable, word-level prosodic representations capable of speech reconstruction and prosody transfer.

Contribution

It proposes a novel IB-based neural network with a modified VQ-VAE layer for learning and controlling prosodic representations in speech generation.

Findings

01

Effective prosody transfer demonstrated

02

IB capacity tuning improves representation quality

03

Model achieves high speech reconstruction fidelity

Abstract

This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation.The problem of representation learning is formulated according to the information bottleneck (IB) principle. A modified VQ-VAE quantized layer is incorporated in the speech generation model to control the IB capacity and adjust the balance between reconstruction power and disentangle capability of the learned representation. The proposed model is able to learn word-level prosodic representations from speech data. With an optimized IB capacity, the learned representations not only are adequate to reconstruct the original speech but also can be used to transfer the prosody onto different textual content. Extensive results of the objective and subjective evaluation are presented to demonstrate the effect of IB capacity control, the effectiveness, and potential usage of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing

MethodsVQ-VAE