On Data Sampling Strategies for Training Neural Network Speech Separation Models
William Ravenscroft, Stefan Goetze, Thomas Hain

TL;DR
This paper investigates how applying signal length limits during training affects neural network speech separation models, finding that specific limits can improve performance and training efficiency by increasing data diversity.
Contribution
It provides a detailed analysis of training signal length limits on speech separation models, demonstrating improved efficiency and performance with specific TSL settings.
Findings
Applying TSL limits can improve model performance.
Specific TSL limits reduce training time significantly.
Random sampling of waveform start points enhances training diversity.
Abstract
Speech separation remains an important area of multi-speaker signal processing. Deep neural network (DNN) models have attained the best performance on many speech separation benchmarks. Some of these models can take significant time to train and have high memory requirements. Previous work has proposed shortening training examples to address these issues but the impact of this on model performance is not yet well understood. In this work, the impact of applying these training signal length (TSL) limits is analysed for two speech separation models: SepFormer, a transformer model, and Conv-TasNet, a convolutional model. The WJS0-2Mix, WHAMR and Libri2Mix datasets are analysed in terms of signal length distribution and its impact on training efficiency. It is demonstrated that, for specific distributions, applying specific TSL limits results in better performance. This is shown to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research
MethodsMulti-Head Attention · Attention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Softmax · Parameterized ReLU · Linear Layer · Layer Normalization · Dense Connections · Position-Wise Feed-Forward Layer · Residual Connection
