Informed Source Extraction With Application to Acoustic Echo Reduction
Mohamed Elminshawi, Wolfgang Mack, and Emanu\"el A. P. Habets

TL;DR
This paper introduces a time-varying speaker extraction method that leverages temporally correlated reference signals, significantly enhancing acoustic echo reduction performance over existing approaches.
Contribution
It proposes a novel time-varying source discriminative model that captures temporal dynamics of reference signals, extending applicability beyond speech sources.
Findings
Significant improvement in acoustic echo reduction performance.
The proposed method outperforms existing speaker extraction techniques.
Applicable to non-speech source separation scenarios.
Abstract
Informed speaker extraction aims to extract a target speech signal from a mixture of sources given prior knowledge about the desired speaker. Recent deep learning-based methods leverage a speaker discriminative model that maps a reference snippet uttered by the target speaker into a single embedding vector that encapsulates the characteristics of the target speaker. However, such modeling deliberately neglects the time-varying properties of the reference signal. In this work, we assume that a reference signal is available that is temporally correlated with the target signal. To take this correlation into account, we propose a time-varying source discriminative model that captures the temporal dynamics of the reference signal. We also show that existing methods and the proposed method can be generalized to non-speech sources as well. Experimental results demonstrate that the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
