Single microphone speaker extraction using unified time-frequency   Siamese-Unet

Aviad Eisenberg; Sharon Gannot; Shlomo E. Chazan

arXiv:2203.02941·cs.SD·March 8, 2022

Single microphone speaker extraction using unified time-frequency Siamese-Unet

Aviad Eisenberg, Sharon Gannot, Shlomo E. Chazan

PDF

Open Access

TL;DR

This paper introduces a unified time-frequency Siamese-Unet architecture for single microphone speaker extraction that outperforms state-of-the-art methods by leveraging both time and frequency domain information.

Contribution

The paper proposes a novel Siamese-Unet model combining time and frequency domain processing for speaker extraction, trained with SI-SDR loss for improved performance.

Findings

01

Outperforms state-of-the-art BSS methods

02

Easier to train compared to existing models

03

Achieves superior speaker extraction results

Abstract

In this paper we present a unified time-frequency method for speaker extraction in clean and noisy conditions. Given a mixed signal, along with a reference signal, the common approaches for extracting the desired speaker are either applied in the time-domain or in the frequency-domain. In our approach, we propose a Siamese-Unet architecture that uses both representations. The Siamese encoders are applied in the frequency-domain to infer the embedding of the noisy and reference spectra, respectively. The concatenated representations are then fed into the decoder to estimate the real and imaginary components of the desired speaker, which are then inverse-transformed to the time-domain. The model is trained with the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) loss to exploit the time-domain information. The time-domain loss is also regularized with frequency-domain loss to preserve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Blind Source Separation Techniques