BeamTransformer: Microphone Array-based Overlapping Speech Detection

Siqi Zheng; Shiliang Zhang; Weilong Huang; Qian Chen; Hongbin Suo,; Ming Lei; Jinwei Feng; Zhijie Yan

arXiv:2109.04049·cs.SD·September 10, 2021

BeamTransformer: Microphone Array-based Overlapping Speech Detection

Siqi Zheng, Shiliang Zhang, Weilong Huang, Qian Chen, Hongbin Suo,, Ming Lei, Jinwei Feng, Zhijie Yan

PDF

Open Access

TL;DR

BeamTransformer is a novel architecture that combines beamforming and transformer models to improve overlapping speech detection by leveraging spatial filtering and sequence modeling, leading to better source localization and separation.

Contribution

The paper introduces BeamTransformer, a new model that integrates beamforming with transformers for enhanced overlapping speech detection and source separation.

Findings

01

BeamTransformer outperforms single-channel methods in overlapping speech detection.

02

It effectively models spatial relationships among signals from different directions.

03

The approach achieves significant gains in source localization and separation accuracy.

Abstract

We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling. BeamTransformer seeks to optimize modeling of sequential relationship among signals from different spatial direction. Overlapping speech detection is one of the tasks where such optimization is favorable. In this paper we effectively apply BeamTransformer to detect overlapping segments. Comparing to single-channel approach, BeamTransformer exceeds in learning to identify the relationship among different beam sequences and hence able to make predictions not only from the acoustic signals but also the localization of the source. The results indicate that a successful incorporation of microphone array signals can lead to remarkable gains. Moreover, BeamTransformer takes one step further, as speech from overlapped speakers have…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis