SPGM: Prioritizing Local Features for enhanced speech separation   performance

Jia Qi Yip; Shengkui Zhao; Yukun Ma; Chongjia Ni; Chong Zhang; Hao; Wang; Trung Hieu Nguyen; Kun Zhou; Dianwen Ng; Eng Siong Chng; Bin Ma

arXiv:2309.12608·eess.AS·March 12, 2024

SPGM: Prioritizing Local Features for enhanced speech separation performance

Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao, Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma

PDF

Open Access 1 Repo

TL;DR

This paper introduces SPGM, a simplified speech separation model that prioritizes local features by replacing inter-blocks with a parameter-efficient global modulation, achieving state-of-the-art results with fewer parameters.

Contribution

The paper proposes the SPGM block, replacing inter-blocks with a parameter-free global pooling and modulation, enabling a single-path model focused on local feature modeling.

Findings

01

SPGM outperforms Sepformer by 0.5 dB SI-SDRi on WSJ0-2Mix.

02

SPGM matches recent SOTA performance with up to 8x fewer parameters.

03

Model and weights are publicly available.

Abstract

Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/yipjiaqi/spgm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing