NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer
Changsheng Quan, Xiaofei Li

TL;DR
This paper introduces NBC2, a multichannel narrow-band speech separation network that processes each frequency independently using shared components, achieving superior separation performance through innovative clustering and normalization techniques.
Contribution
The paper presents NBC2, a novel narrow-band conformer-based network with group batch normalization, outperforming state-of-the-art methods in speech separation.
Findings
Outperforms other state-of-the-art methods significantly
GBN improves SDR by 3 dB over other normalization methods
Spectrum-agnostic network effectively performs frame clustering
Abstract
This work proposes a multichannel narrow-band speech separation network. In the short-time Fourier transform (STFT) domain, the proposed network processes each frequency independently, and all frequencies use a shared network. For each frequency, the network performs end-to-end speech separation, namely taking as input the STFT coefficients of microphone signals, and predicting the separated STFT coefficients of multiple speakers. The proposed network learns to cluster the frame-wise spatial/steering vectors that belong to different speakers. It is mainly composed of three components. First, a self-attention network. Clustering of spatial vectors shares a similar principle with the self-attention mechanism in the sense of computing the similarity of vectors and then aggregating similar vectors. Second, a convolutional feed-forward network. The convolutional layers are employed for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Music and Audio Processing
