CycleGAN-Based Unpaired Speech Dereverberation

Hannah Muckenhirn; Aleksandr Safin; Hakan Erdogan; Felix de Chaumont; Quitry; Marco Tagliasacchi; Scott Wisdom; John R. Hershey

arXiv:2203.15652·eess.AS·March 30, 2022

CycleGAN-Based Unpaired Speech Dereverberation

Hannah Muckenhirn, Aleksandr Safin, Hakan Erdogan, Felix de Chaumont, Quitry, Marco Tagliasacchi, Scott Wisdom, John R. Hershey

PDF

Open Access

TL;DR

This paper introduces a CycleGAN-based method for speech dereverberation that trains on unpaired data, achieving comparable performance to traditional paired-data models in objective and subjective evaluations.

Contribution

The paper presents a novel CycleGAN approach enabling dereverberation models to be trained without paired data, reducing data collection costs and expanding applicability.

Findings

01

Unpaired model performs comparably to paired model on objective metrics.

02

Subjective quality of unpaired model matches that of paired model on unseen data.

03

CycleGAN enables effective dereverberation training with unpaired data.

Abstract

Typically, neural network-based speech dereverberation models are trained on paired data, composed of a dry utterance and its corresponding reverberant utterance. The main limitation of this approach is that such models can only be trained on large amounts of data and a variety of room impulse responses when the data is synthetically reverberated, since acquiring real paired data is costly. In this paper we propose a CycleGAN-based approach that enables dereverberation models to be trained on unpaired data. We quantify the impact of using unpaired data by comparing the proposed unpaired model to a paired model with the same architecture and trained on the paired version of the same dataset. We show that the performance of the unpaired model is comparable to the performance of the paired model on two different datasets, according to objective evaluation metrics. Furthermore, we run two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research