Task-specific Optimization of Virtual Channel Linear Prediction-based Speech Dereverberation Front-End for Far-Field Speaker Verification
Joon-Young Yang, Joon-Hyuk Chang

TL;DR
This paper introduces a task-specific optimized VACE-WPE front-end for speech dereverberation that improves far-field speaker verification by explicitly training to cancel reverberation and noise, outperforming neural methods.
Contribution
The study presents a novel task-specific optimization framework for VACE-WPE, enhancing speech dereverberation for speaker verification in challenging acoustic environments.
Findings
VACE-WPE outperforms conventional WPE in dereverberation and noise cancellation.
Fine-tuning within a task-specific framework improves speaker verification accuracy.
The method generalizes well to in-the-wild scenarios beyond controlled conditions.
Abstract
Developing a single-microphone speech denoising or dereverberation front-end for robust automatic speaker verification (ASV) in noisy far-field speaking scenarios is challenging. To address this problem, we present a novel front-end design that involves a recently proposed extension of the weighted prediction error (WPE) speech dereverberation algorithm, the virtual acoustic channel expansion (VACE)-WPE. It is demonstrated experimentally in this study that unlike the conventional WPE algorithm, the VACE-WPE can be explicitly trained to cancel out both late reverberation and background noise. To build the front-end, the VACE-WPE is first independently (pre)trained to produce "noisy" dereverberated signals. Subsequently, given a pretrained speaker embedding model, the VACE-WPE is additionally fine-tuned within a task-specific optimization (TSO) framework, causing the speaker embedding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
