Transformation of low-quality device-recorded speech to high-quality   speech using improved SEGAN model

Seyyed Saeed Sarfjoo; Xin Wang; Gustav Eje Henter; Jaime; Lorenzo-Trueba; Shinji Takaki; Junichi Yamagishi

arXiv:1911.03952·cs.SD·November 21, 2019·30 cites

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime, Lorenzo-Trueba, Shinji Takaki, Junichi Yamagishi

PDF

Open Access 1 Repo

TL;DR

This paper presents an improved SEGAN model that effectively transforms low-quality device-recorded speech into high-quality speech, demonstrating significant enhancement in speech quality through a new dataset and robust training modifications.

Contribution

The paper introduces an enhanced SEGAN-based approach with stability improvements and a new dataset for transforming low-quality speech into high-quality audio.

Findings

01

Significant speech quality improvements shown in listening tests

02

Robust and stable training of the enhanced SEGAN model

03

Effective transformation from low to high-quality speech signals

Abstract

Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones. The objective of this research was to study the automatic generation of high-quality speech from such low-quality device-recorded speech, which could then be applied to many speech-generation tasks. In this paper, we first introduce our new device-recorded speech dataset then propose an improved end-to-end method for automatically transforming the low-quality device-recorded speech into professional high-quality speech. Our method is an extension of a generative adversarial network (GAN)-based speech enhancement model called speech enhancement GAN (SEGAN), and we present two modifications to make model training more robust and stable. Finally, from a large-scale listening test, we show that our method can significantly enhance the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ssarfjoo/improvedsegan
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis