SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech   Enhancement

Zhongshu Hou; Tong Lei; Qinwen Hu; Zhanzhong Cao; Ming Tang; and Jing; Lu

arXiv:2406.16317·cs.SD·August 20, 2024

SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

Zhongshu Hou, Tong Lei, Qinwen Hu, Zhanzhong Cao, Ming Tang, and Jing, Lu

PDF

Open Access

TL;DR

This paper introduces an SNR-progressive speech enhancement model with harmonic compensation that improves low-SNR speech quality by utilizing reliable pitch estimation and harmonic recovery techniques, outperforming existing methods.

Contribution

The paper proposes a novel SNR-progressive model with harmonic compensation that enhances low-SNR speech by leveraging intermediate pitch estimation and harmonic recovery mechanisms.

Findings

01

The proposed model outperforms existing methods in low-SNR speech enhancement.

02

The multi-modal speech extraction system using this model ranks first in ICASSP 2024 MISP Challenge.

03

Extensive experiments validate the effectiveness of the proposed approach.

Abstract

Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from the intermediate output, which has the benefit of retaining more speech components than the coarse estimate while possessing a significant higher SNR than the input noisy speech. An effective harmonic compensation mechanism is introduced for better harmonic recovery. Extensive ex-periments demonstrate the advantage of our proposed model. A multi-modal speech extraction system based on the proposed backbone model ranks first in the ICASSP 2024 MISP Challenge:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques