Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

S. Rijal; R. Neupane; S. P. Mainali; S. K. Regmi; S. Maharjan

arXiv:2308.00010·cs.SD·February 19, 2026

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

S. Rijal, R. Neupane, S. P. Mainali, S. K. Regmi, S. Maharjan

PDF

Open Access

TL;DR

This paper introduces an efficient Transformer-based model for monaural multi-speaker speech separation, achieving a good balance between computational complexity and separation accuracy, trained on the LibriMix dataset.

Contribution

It presents a novel, computationally efficient Transformer architecture for monaural speech separation that maintains high performance.

Findings

01

Reduces computational complexity significantly

02

Maintains high speech separation accuracy

03

Shows promising results on LibriMix dataset

Abstract

Cocktail party problem is the scenario where it is difficult to separate or distinguish individual speaker from a mixed speech from several speakers. There have been several researches going on in this field but the size and complexity of the model is being traded off with the accuracy and robustness of speech separation. "Monaural multi-speaker speech separation" presents a speech-separation model based on the Transformer architecture and its efficient forms. The model has been trained with the LibriMix dataset containing diverse speakers' utterances. The model separates 2 distinct speaker sources from a mixed audio input. The developed model approaches the reduction in computational complexity of the speech separation model, with minimum tradeoff with the performance of prevalent speech separation model and it has shown significant movement towards that goal. This project foresees, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research

MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Layer Normalization · Softmax · Linear Layer · Adam · Dense Connections · Label Smoothing · Dropout