A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech   Domain Adaptation

Ehsan Hosseini-Asl; Yingbo Zhou; Caiming Xiong; Richard Socher

arXiv:1804.00522·cs.CL·July 11, 2018

A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation

Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher

PDF

TL;DR

This paper introduces a multi-discriminator CycleGAN model for unsupervised speech domain adaptation, improving speech recognition accuracy and naturalness across different speaker genders without parallel data.

Contribution

It proposes a novel multi-discriminator approach that enhances spectrogram generation for better domain adaptation in speech recognition tasks.

Findings

01

Achieved 7.41% phoneme error rate reduction on TIMIT.

02

Achieved 11.10% word error rate reduction on WSJ.

03

Generated more natural sounding speech in cross-domain conditions.

Abstract

Domain adaptation plays an important role for speech recognition models, in particular, for domains that have low resources. We propose a novel generative model based on cyclic-consistent generative adversarial network (CycleGAN) for unsupervised non-parallel speech domain adaptation. The proposed model employs multiple independent discriminators on the power spectrogram, each in charge of different frequency bands. As a result we have 1) better discriminators that focus on fine-grained details of the frequency features, and 2) a generator that is capable of generating more realistic domain-adapted spectrogram. We demonstrate the effectiveness of our method on speech recognition with gender adaptation, where the model only has access to supervised data from one gender during training, but is evaluated on the other at test time. Our model is able to achieve an average of $7.41%$ on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.