Improving Speech Enhancement via Event-based Query
Yifei Xin, Xiulian Peng, Yan Lu

TL;DR
This paper introduces an event-based query approach for speech enhancement that uses pre-trained speech embeddings as fixed queries, improving performance without extra complexity or speaker enrollment.
Contribution
The paper proposes a novel event-based query method using pre-trained speech embeddings as fixed, generalizable queries for speech enhancement, reducing complexity and speaker-specific requirements.
Findings
Significant performance improvements over baseline methods.
Golden speech queries generalize well across datasets.
No additional enrollment needed for new speakers.
Abstract
Existing deep learning based speech enhancement (SE) methods either use blind end-to-end training or explicitly incorporate speaker embedding or phonetic information into the SE network to enhance speech quality. In this paper, we perceive speech and noises as different types of sound events and propose an event-based query method for SE. Specifically, representative speech embeddings that can discriminate speech with noises are first pre-trained with the sound event detection (SED) task. The embeddings are then clustered into fixed golden speech queries to assist the SE network to enhance the speech from noisy audio. The golden speech queries can be obtained offline and generalizable to different SE datasets and networks. Therefore, little extra complexity is introduced and no enrollment is needed for each speaker. Experimental results show that the proposed method yields significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
