Neural Vocoders as Speech Enhancers

Andong Li; Zhihang Sun; Fengyuan Hao; Xiaodong Li; and Chengshi Zheng

arXiv:2501.13465·cs.SD·January 24, 2025

Neural Vocoders as Speech Enhancers

Andong Li, Zhihang Sun, Fengyuan Hao, Xiaodong Li, and Chengshi Zheng

PDF

Open Access 1 Repo

TL;DR

This paper explores unifying speech enhancement and neural vocoding tasks by leveraging their shared rank behavior, demonstrating that models can be adapted or jointly trained to perform both tasks effectively.

Contribution

It introduces a unified framework showing that speech enhancement models can be adapted for vocoding, enabling joint training for both tasks with comparable performance.

Findings

01

Existing speech enhancement models can be trained for vocoding.

02

A single model can jointly perform speech enhancement and vocoding.

03

Joint training achieves performance comparable to task-specific models.

Abstract

Speech enhancement (SE) and neural vocoding are traditionally viewed as separate tasks. In this work, we observe them under a common thread: the rank behavior of these processes. This observation prompts two key questions: \textit{Can a model designed for one task's rank degradation be adapted for the other?} and \textit{Is it possible to address both tasks using a unified model?} Our empirical findings demonstrate that existing speech enhancement models can be successfully trained to perform vocoding tasks, and a single model, when jointly trained, can effectively handle both tasks with performance comparable to separately trained models. These results suggest that speech enhancement and neural vocoding can be unified under a broader framework of speech restoration. Code: https://github.com/Andong-Li-speech/Neural-Vocoders-as-Speech-Enhancers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andong-li-speech/neural-vocoders-as-speech-enhancers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage Development and Disorders · Phonetics and Phonology Research · Speech and dialogue systems