On Time Domain Conformer Models for Monaural Speech Separation in Noisy   Reverberant Acoustic Environments

William Ravenscroft; Stefan Goetze; Thomas Hain

arXiv:2310.06125·cs.SD·October 11, 2023

On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

William Ravenscroft, Stefan Goetze, Thomas Hain

PDF

Open Access 1 Repo

TL;DR

This paper explores the application of time domain conformer models for monaural speech separation in noisy, reverberant environments, demonstrating their efficiency and effectiveness compared to existing models.

Contribution

It introduces time domain conformers with subsampling layers for speech separation, showing improved efficiency and state-of-the-art performance on benchmark datasets.

Findings

01

Conformers outperform dual-path networks for shorter signals.

02

Subsampling layers enhance computational efficiency.

03

Achieved 14.6 dB and 21.2 dB SISDR improvements on benchmarks.

Abstract

Speech separation remains an important topic for multi-speaker technology researchers. Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation. Most recent state-of-the-art (SOTA) separation models have been time-domain audio separation networks (TasNets). A number of successful models have made use of dual-path (DP) networks which sequentially process local and global information. Time domain conformers (TD-Conformers) are an analogue of the DP approach in that they also process local and global context sequentially but have a different time complexity function. It is shown that for realistic shorter signal lengths, conformers are more efficient when controlling for feature dimension. Subsampling layers are proposed to further improve computational efficiency. The best TD-Conformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jwr1995/pubsep
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsConvolution