MVNet: Memory Assistance and Vocal Reinforcement Network for Speech   Enhancement

Jianrong Wang; Xiaomin Li; Xuewei Li; Mei Yu; Qiang Fang; Li Liu

arXiv:2209.07302·cs.SD·September 16, 2022

MVNet: Memory Assistance and Vocal Reinforcement Network for Speech Enhancement

Jianrong Wang, Xiaomin Li, Xuewei Li, Mei Yu, Qiang Fang, Li Liu

PDF

Open Access

TL;DR

MVNet is a novel speech enhancement framework that integrates memory assistance and vocal reinforcement modules to simultaneously improve automatic speech recognition and speaker verification performance.

Contribution

The paper introduces MVNet, combining memory assistance and vocal reinforcement modules, along with a new loss function, to enhance both ASR and ASV tasks in speech processing.

Findings

01

Outperforms baseline methods in speech quality and intelligibility.

02

Improves speaker vocal similarity metrics.

03

Enhances downstream ASR and ASV performance on Libri2mix.

Abstract

Speech enhancement improves speech quality and promotes the performance of various downstream tasks. However, most current speech enhancement work was mainly devoted to improving the performance of downstream automatic speech recognition (ASR), only a relatively small amount of work focused on the automatic speaker verification (ASV) task. In this work, we propose a MVNet consisted of a memory assistance module which improves the performance of downstream ASR and a vocal reinforcement module which boosts the performance of ASV. In addition, we design a new loss function to improve speaker vocal similarity. Experimental results on the Libri2mix dataset show that our method outperforms baseline methods in several metrics, including speech quality, intelligibility, and speaker vocal similarity et al.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders