WaveCycleGAN: Synthetic-to-natural speech waveform conversion using   cycle-consistent adversarial networks

Kou Tanaka; Takuhiro Kaneko; Nobukatsu Hojo; Hirokazu Kameoka

arXiv:1809.10288·eess.AS·October 2, 2018·5 cites

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks

Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka

PDF

Open Access

TL;DR

WaveCycleGAN introduces a cycle-consistent adversarial network approach to directly convert synthetic speech waveforms into natural-sounding speech, improving naturalness without relying on vocoders or explicit waveform assumptions.

Contribution

This paper presents a novel waveform-level conversion method using cycle-consistent adversarial networks that enhances speech naturalness and reduces over-smoothing effects.

Findings

01

Significantly improves speech naturalness in synthetic-to-natural conversion.

02

Reduces over-smoothing effects in acoustic features.

03

Operates directly on waveforms without explicit waveform assumptions.

Abstract

We propose a learning-based filter that allows us to directly modify a synthetic speech waveform into a natural speech waveform. Speech-processing systems using a vocoder framework such as statistical parametric speech synthesis and voice conversion are convenient especially for a limited number of data because it is possible to represent and process interpretable acoustic features over a compact space, such as the fundamental frequency (F0) and mel-cepstrum. However, a well-known problem that leads to the quality degradation of generated speech is an over-smoothing effect that eliminates some detailed structure of generated/converted acoustic features. To address this issue, we propose a synthetic-to-natural speech waveform conversion technique that uses cycle-consistent adversarial networks and which does not require any explicit assumption about speech waveform in adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing