Deformable Temporal Convolutional Networks for Monaural Noisy   Reverberant Speech Separation

William Ravenscroft; Stefan Goetze; Thomas Hain

arXiv:2210.15305·cs.SD·March 13, 2023

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

William Ravenscroft, Stefan Goetze, Thomas Hain

PDF

Open Access 2 Repos

TL;DR

This paper introduces deformable temporal convolutional networks that adapt their receptive fields for improved monaural noisy reverberant speech separation, achieving state-of-the-art results with fewer parameters.

Contribution

It proposes deformable convolution within TCNs to dynamically adapt receptive fields based on reverberation characteristics, enhancing speech separation performance.

Findings

01

Achieved 11.1 dB SISDR improvement on WHAMR benchmark.

02

Small deformable TCN with 1.3M parameters performs comparably to larger models.

03

Dynamic RF adaptation benefits reverberant speech separation.

Abstract

Speech separation models are used for isolating individual speakers in many speech processing applications. Deep learning models have been shown to lead to state-of-the-art (SOTA) results on a number of speech separation benchmarks. One such class of models known as temporal convolutional networks (TCNs) has shown promising results for speech separation tasks. A limitation of these models is that they have a fixed receptive field (RF). Recent research in speech dereverberation has shown that the optimal RF of a TCN varies with the reverberation characteristics of the speech signal. In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation. The proposed models are capable of achieving an 11.1 dB average scale-invariant signalto-distortion ratio (SISDR) improvement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies

MethodsConvolution · Deformable Convolution