On Data Sampling Strategies for Training Neural Network Speech   Separation Models

William Ravenscroft; Stefan Goetze; Thomas Hain

arXiv:2304.07142·cs.SD·June 19, 2023·1 cites

On Data Sampling Strategies for Training Neural Network Speech Separation Models

William Ravenscroft, Stefan Goetze, Thomas Hain

PDF

Open Access

TL;DR

This paper investigates how applying signal length limits during training affects neural network speech separation models, finding that specific limits can improve performance and training efficiency by increasing data diversity.

Contribution

It provides a detailed analysis of training signal length limits on speech separation models, demonstrating improved efficiency and performance with specific TSL settings.

Findings

01

Applying TSL limits can improve model performance.

02

Specific TSL limits reduce training time significantly.

03

Random sampling of waveform start points enhances training diversity.

Abstract

Speech separation remains an important area of multi-speaker signal processing. Deep neural network (DNN) models have attained the best performance on many speech separation benchmarks. Some of these models can take significant time to train and have high memory requirements. Previous work has proposed shortening training examples to address these issues but the impact of this on model performance is not yet well understood. In this work, the impact of applying these training signal length (TSL) limits is analysed for two speech separation models: SepFormer, a transformer model, and Conv-TasNet, a convolutional model. The WJS0-2Mix, WHAMR and Libri2Mix datasets are analysed in terms of signal length distribution and its impact on training efficiency. It is demonstrated that, for specific distributions, applying specific TSL limits results in better performance. This is shown to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research

MethodsMulti-Head Attention · Attention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Parameterized ReLU · Linear Layer · Layer Normalization · Dense Connections · Position-Wise Feed-Forward Layer · Residual Connection