Speech Enhancement with Multi-granularity Vector Quantization

Xiao-Ying Zhao; Qiu-Shi Zhu; Jie Zhang

arXiv:2302.08342·eess.AS·February 17, 2023

Speech Enhancement with Multi-granularity Vector Quantization

Xiao-Ying Zhao, Qiu-Shi Zhu, Jie Zhang

PDF

Open Access

TL;DR

This paper explores the use of multi-granularity vector quantization at different layers of neural networks, combined with attention mechanisms, to improve speech enhancement performance by capturing both global and local speech features.

Contribution

It introduces a novel approach of applying VQ at multiple layers with various codebooks and integrates it with pre-trained models and attention for enhanced speech denoising.

Findings

01

Improved speech enhancement results on Valentini dataset.

02

Multi-granularity VQ effectively captures diverse speech features.

03

Pre-trained models significantly impact enhancement performance.

Abstract

With advances in deep learning, neural network based speech enhancement (SE) has developed rapidly in the last decade. Meanwhile, the self-supervised pre-trained model and vector quantization (VQ) have achieved excellent performance on many speech-related tasks, while they are less explored on SE. As it was shown in our previous work that utilizing a VQ module to discretize noisy speech representations is beneficial for speech denoising, in this work we therefore study the impact of using VQ at different layers with different number of codebooks. Different VQ modules indeed enable to extract multiple-granularity speech features. Following an attention mechanism, the contextual features extracted by a pre-trained model are fused with the local features extracted by the encoder, such that both global and local information are preserved to reconstruct the enhanced speech. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies