Speaker-independent raw waveform model for glottal excitation

Lauri Juvela; Vassilis Tsiaras; Bajibabu Bollepalli; Manu Airaksinen,; Junichi Yamagishi; Paavo Alku

arXiv:1804.09593·eess.AS·April 26, 2018

Speaker-independent raw waveform model for glottal excitation

Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen,, Junichi Yamagishi, Paavo Alku

PDF

TL;DR

This paper introduces a speaker-independent waveform generator called GlotNet that uses a source-filter model with WaveNet to produce high-quality speech from limited data, improving multi-speaker speech synthesis.

Contribution

The paper presents a novel multi-speaker GlotNet vocoder that generates glottal excitation waveforms conditioned on a source-filter model, reducing data and computational requirements.

Findings

01

GlotNet performs favorably to direct WaveNet vocoders in listening tests.

02

The source-filter approach enables effective multi-speaker modeling with limited resources.

03

The model improves speech quality over classical vocoders in multi-speaker scenarios.

Abstract

Recent speech technology research has seen a growing interest in using WaveNets as statistical vocoders, i.e., generating speech waveforms from acoustic features. These models have been shown to improve the generated speech quality over classical vocoders in many tasks, such as text-to-speech synthesis and voice conversion. Furthermore, conditioning WaveNets with acoustic features allows sharing the waveform generator model across multiple speakers without additional speaker codes. However, multi-speaker WaveNet models require large amounts of training data and computation to cover the entire acoustic space. This paper proposes leveraging the source-filter model of speech production to more effectively train a speaker-independent waveform generator with limited resources. We present a multi-speaker 'GlotNet' vocoder, which utilizes a WaveNet to generate glottal excitation waveforms,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMixture of Logistic Distributions · Dilated Causal Convolution · WaveNet