SPGM: Prioritizing Local Features for enhanced speech separation performance
Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao, Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma

TL;DR
This paper introduces SPGM, a simplified speech separation model that prioritizes local features by replacing inter-blocks with a parameter-efficient global modulation, achieving state-of-the-art results with fewer parameters.
Contribution
The paper proposes the SPGM block, replacing inter-blocks with a parameter-free global pooling and modulation, enabling a single-path model focused on local feature modeling.
Findings
SPGM outperforms Sepformer by 0.5 dB SI-SDRi on WSJ0-2Mix.
SPGM matches recent SOTA performance with up to 8x fewer parameters.
Model and weights are publicly available.
Abstract
Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
