VoiceFixer: Toward General Speech Restoration with Neural Vocoder

Haohe Liu; Qiuqiang Kong; Qiao Tian; Yan Zhao; DeLiang Wang; Chuanzeng; Huang; Yuxuan Wang

arXiv:2109.13731·cs.SD·October 6, 2021·25 cites

VoiceFixer: Toward General Speech Restoration with Neural Vocoder

Haohe Liu, Qiuqiang Kong, Qiao Tian, Yan Zhao, DeLiang Wang, Chuanzeng, Huang, Yuxuan Wang

PDF

Open Access 3 Repos 1 Models

TL;DR

VoiceFixer is a neural framework designed for comprehensive speech restoration, effectively removing multiple distortions simultaneously and outperforming traditional single-task methods in both synthetic and real-world degraded speech scenarios.

Contribution

The paper introduces a novel general speech restoration framework that handles multiple distortions at once, advancing beyond single-task approaches with a neural vocoder-based architecture.

Findings

01

VoiceFixer outperforms single-task speech enhancement models in MOS scores.

02

The model generalizes well to severely degraded real speech recordings.

03

VoiceFixer effectively restores old and historical speech recordings.

Abstract

Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus on single-task speech restoration (SSR), such as speech denoising or speech declipping. However, SSR systems only focus on one task and do not address the general speech restoration problem. In addition, previous SSR systems show limited performance in some speech restoration tasks such as speech super-resolution. To overcome those limitations, we propose a general speech restoration (GSR) task that attempts to remove multiple distortions simultaneously. Furthermore, we propose VoiceFixer, a generative framework to address the GSR task. VoiceFixer consists of an analysis stage and a synthesis stage to mimic the speech analysis and comprehension of the human auditory system. We employ a ResUNet to model the analysis stage and a neural vocoder to model the synthesis stage. We evaluate VoiceFixer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Diogodiogod/voicefixer-models
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research