A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

David Ditter; Timo Gerkmann

arXiv:1910.11615·eess.AS·April 20, 2021

A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

David Ditter, Timo Gerkmann

PDF

1 Repo

TL;DR

This paper introduces a multi-phase gammatone filterbank as a deterministic alternative to learned encoders in speech separation, achieving comparable or better performance with fewer filters and low latency.

Contribution

The authors propose a novel multi-phase gammatone filterbank that replaces learned encoders in Conv-TasNet, improving efficiency and performance in speech separation tasks.

Findings

01

0.7 dB SI-SNR improvement with the proposed filterbank

02

Reduced number of filters from 512 to 128 without performance loss

03

Effective low-latency processing with 2 ms filter length

Abstract

In this work, we investigate if the learned encoder of the end-to-end convolutional time domain audio separation network (Conv-TasNet) is the key to its recent success, or if the encoder can just as well be replaced by a deterministic hand-crafted filterbank. Motivated by the resemblance of the trained encoder of Conv-TasNet to auditory filterbanks, we propose to employ a deterministic gammatone filterbank. In contrast to a common gammatone filterbank, our filters are restricted to 2 ms length to allow for low-latency processing. Inspired by the encoder learned by Conv-TasNet, in addition to the logarithmically spaced filters, the proposed filterbank holds multiple gammatone filters at the same center frequency with varying phase shifts. We show that replacing the learned encoder with our proposed multi-phase gammatone filterbank (MP-GTF) even leads to a scale-invariant source-to-noise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sp-uhh/mp-gtf
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.