MVNet: Memory Assistance and Vocal Reinforcement Network for Speech Enhancement
Jianrong Wang, Xiaomin Li, Xuewei Li, Mei Yu, Qiang Fang, Li Liu

TL;DR
MVNet is a novel speech enhancement framework that integrates memory assistance and vocal reinforcement modules to simultaneously improve automatic speech recognition and speaker verification performance.
Contribution
The paper introduces MVNet, combining memory assistance and vocal reinforcement modules, along with a new loss function, to enhance both ASR and ASV tasks in speech processing.
Findings
Outperforms baseline methods in speech quality and intelligibility.
Improves speaker vocal similarity metrics.
Enhances downstream ASR and ASV performance on Libri2mix.
Abstract
Speech enhancement improves speech quality and promotes the performance of various downstream tasks. However, most current speech enhancement work was mainly devoted to improving the performance of downstream automatic speech recognition (ASR), only a relatively small amount of work focused on the automatic speaker verification (ASV) task. In this work, we propose a MVNet consisted of a memory assistance module which improves the performance of downstream ASR and a vocal reinforcement module which boosts the performance of ASV. In addition, we design a new loss function to improve speaker vocal similarity. Experimental results on the Libri2mix dataset show that our method outperforms baseline methods in several metrics, including speech quality, intelligibility, and speaker vocal similarity et al.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders
