Multichannel Speech Separation with Narrow-band Conformer

Changsheng Quan; Xiaofei Li

arXiv:2204.04464·cs.SD·July 4, 2022

Multichannel Speech Separation with Narrow-band Conformer

Changsheng Quan, Xiaofei Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces NBC, a multichannel speech separation method using narrow-band Conformer that processes each frequency independently to effectively cluster spatial vectors and improve separation performance.

Contribution

The paper presents a novel narrow-band Conformer network that leverages frequency-independent processing for multichannel speech separation, outperforming existing methods.

Findings

01

Achieves superior speech separation accuracy compared to state-of-the-art methods.

02

Effectively exploits spatial vector clustering through the Conformer architecture.

03

Demonstrates large margin improvements in experimental evaluations.

Abstract

This work proposes a multichannel speech separation method with narrow-band Conformer (named NBC). The network is trained to learn to automatically exploit narrow-band speech separation information, such as spatial vector clustering of multiple speakers. Specifically, in the short-time Fourier transform (STFT) domain, the network processes each frequency independently, and is shared by all frequencies. For one frequency, the network inputs the STFT coefficients of multichannel mixture signals, and predicts the STFT coefficients of separated speech signals. Clustering of spatial vectors shares a similar principle with the self-attention mechanism in the sense of computing the similarity of vectors and then aggregating similar vectors. Therefore, Conformer would be especially suitable for the present problem. Experiments show that the proposed narrow-band Conformer achieves better speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

audio-westlakeu/nbss
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing