Multichannel Long-Term Streaming Neural Speech Enhancement for Static   and Moving Speakers

Changsheng Quan; Xiaofei Li

arXiv:2403.07675·cs.SD·June 21, 2024·1 cites

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers

Changsheng Quan, Xiaofei Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces an online multichannel speech enhancement neural network that effectively handles static and moving speakers over long audio streams, improving length extrapolation and maintaining high performance.

Contribution

It develops online variants of SpatialNet with linear inference complexity, enhancing long-term streaming speech enhancement for static and moving speakers.

Findings

01

Online SpatialNet achieves superior speech enhancement performance.

02

The proposed methods effectively handle long audio streams.

03

Length extrapolation is significantly improved with training strategies.

Abstract

In this work, we extend our previously proposed offline SpatialNet for long-term streaming multichannel speech enhancement in both static and moving speaker scenarios. SpatialNet exploits spatial information, such as the spatial/steering direction of speech, for discriminating between target speech and interferences, and achieved outstanding performance. The core of SpatialNet is a narrow-band self-attention module used for learning the temporal dynamic of spatial vectors. Towards long-term streaming speech enhancement, we propose to replace the offline self-attention network with online networks that have linear inference complexity w.r.t signal length and meanwhile maintain the capability of learning long-term information. Three variants are developed based on (i) masked self-attention, (ii) Retention, a self-attention variant with linear inference complexity, and (iii) Mamba, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

audio-westlakeu/nbss
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis