The NPU-Elevoc Personalized Speech Enhancement System for ICASSP2023 DNS Challenge
Xiaopeng Yan, Yindi Yang, Zhihao Guo, Liangliang Peng, Lei Xie

TL;DR
This paper presents NPU-Elevoc's personalized speech enhancement system for ICASSP 2023, achieving top rankings by improving speaker embedding fusion, training strategies, and leveraging adversarial and multi-scale loss techniques.
Contribution
The paper introduces enhancements to the TEA-PSE 2.0 model, including advanced speaker embedding fusion and training optimizations, for improved speech enhancement performance.
Findings
Tied for 1st in headset track at ICASSP 2023 challenge.
Ranked 2nd in speakerphone track at ICASSP 2023.
Demonstrated effectiveness of adversarial training and multi-scale loss.
Abstract
This paper describes our NPU-Elevoc personalized speech enhancement system (NAPSE) for the 5th Deep Noise Suppression Challenge at ICASSP 2023. Based on the superior two-stage model TEA-PSE 2.0, our system particularly explores better strategy for speaker embedding fusion, optimizes the model training pipeline, and leverages adversarial training and multi-scale loss. According to the results, our system is tied for the 1st place in the headset track (track 1) and ranked 2nd in the speakerphone track (track 2).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
