A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

Azam Rabiee; Geonmin Kim; Tae-Ho Kim; Soo-Young Lee

arXiv:1810.05319·eess.AS·November 28, 2022

A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

Azam Rabiee, Geonmin Kim, Tae-Ho Kim, Soo-Young Lee

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel fully time-domain neural network for subband speech synthesis that reduces complexity and achieves high-quality speech generation comparable to fullband models.

Contribution

It introduces a subband-based TTS model using wavelet analysis and CNNs, achieving simpler architecture and comparable speech quality to fullband models.

Findings

01

Outperforms fullband models in subjective and objective measures.

02

Uses wavelet analysis for efficient subband decomposition.

03

Achieves nearly end-to-end subband TTS with comparable quality.

Abstract

This paper introduces a deep neural network model for subband-based speech synthesizer. The model benefits from the short bandwidth of the subband signals to reduce the complexity of the time-domain speech generator. We employed the multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into subbands in time domain. Inspired from the WaveNet, a convolutional neural network (CNN) model predicts subband speech signals fully in time domain. Due to the short bandwidth of the subbands, a simple network architecture is enough to train the simple patterns of the subbands accurately. In the ground truth experiments with teacher-forcing, the subband synthesizer outperforms the fullband model significantly in terms of both subjective and objective measures. In addition, by conditioning the model on the phoneme sequence using a pronunciation dictionary, we have achieved the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AzamRabiee/subband-TTS
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing