PeriodNet: A non-autoregressive waveform generation model with a   structure separating periodic and aperiodic components

Yukiya Hono; Shinji Takaki; Kei Hashimoto; Keiichiro Oura; Yoshihiko; Nankaku; Keiichi Tokuda

arXiv:2102.07786·eess.AS·February 17, 2021·1 cites

PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components

Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko, Nankaku, Keiichi Tokuda

PDF

Open Access

TL;DR

PeriodNet is a novel non-autoregressive speech waveform generation model that explicitly separates periodic and aperiodic components, improving naturalness and robustness in generated speech, especially for unseen pitch ranges.

Contribution

It introduces a new structure that models periodic and aperiodic components separately without prior decomposition, enhancing speech synthesis quality.

Findings

01

Improves naturalness of generated speech waveforms.

02

Effective for pitches outside training data range.

03

Outperforms existing models in subjective evaluations.

Abstract

We propose PeriodNet, a non-autoregressive (non-AR) waveform generation model with a new model structure for modeling periodic and aperiodic components in speech waveforms. The non-AR waveform generation models can generate speech waveforms parallelly and can be used as a speech vocoder by conditioning an acoustic feature. Since a speech waveform contains periodic and aperiodic components, both components should be appropriately modeled to generate a high-quality speech waveform. However, it is difficult to decompose the components from a natural speech waveform in advance. To address this issue, we propose a parallel model and a series model structure separating periodic and aperiodic components. The features of our proposed models are that explicit periodic and aperiodic signals are taken as input, and external periodic/aperiodic decomposition is not needed in training. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing