Dual-path Transformer Based Neural Beamformer for Target Speech   Extraction

Aoqi Guo; Sichong Qian; Baoxiang Li; Dazhi Gao

arXiv:2308.15990·cs.SD·September 8, 2023·1 cites

Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Aoqi Guo, Sichong Qian, Baoxiang Li, Dazhi Gao

PDF

Open Access 1 Repo

TL;DR

This paper proposes a dual-path transformer neural beamformer that enhances target speech extraction by combining time-domain cross-attention and frequency-domain self-attention, outperforming existing methods with fewer parameters.

Contribution

The introduction of a dual-path transformer supported neural beamformer that operates end-to-end, improving performance and reducing model complexity compared to prior approaches.

Findings

01

Outperforms current neural beamforming algorithms in speech separation

02

Reduces model parameter count significantly

03

Operates in a comprehensive end-to-end manner

Abstract

Neural beamformers, which integrate both pre-separation and beamforming modules, have demonstrated impressive effectiveness in target speech extraction. Nevertheless, the performance of these beamformers is inherently limited by the predictive accuracy of the pre-separation module. In this paper, we introduce a neural beamformer supported by a dual-path transformer. Initially, we employ the cross-attention mechanism in the time domain to extract crucial spatial information related to beamforming from the noisy covariance matrix. Subsequently, in the frequency domain, the self-attention mechanism is employed to enhance the model's ability to process frequency-specific details. By design, our model circumvents the influence of pre-separation modules, delivering performance in a more comprehensive end-to-end manner. Experimental results reveal that our model not only outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aworselife/dptbf
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing