Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting

Jie Wang; Yuji Liu; Binling Wang; Yiming Zhi; Song Li; Shipeng Xia,; Jiayang Zhang; Feng Tong; Lin Li; Qingyang Hong

arXiv:2209.12002·eess.AS·September 27, 2022

Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting

Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li, Shipeng Xia,, Jiayang Zhang, Feng Tong, Lin Li, Qingyang Hong

PDF

TL;DR

This paper introduces a spatial-aware speaker diarization system utilizing multi-channel audio, a novel neural network architecture, and spatial features to improve accuracy and overlapped speech detection in multi-party meetings.

Contribution

It proposes DMSNet, a novel multi-stream neural network with attention superdirective beamforming for robust speaker diarization in multi-channel recordings.

Findings

01

Achieved 93.53% accuracy in overlapped speech detection.

02

Reduced diarization error rate from 13.45% to 7.64%.

03

Enhanced robustness of speaker embeddings using spatial information.

Abstract

This paper describes a spatial-aware speaker diarization system for the multi-channel multi-party meeting. The diarization system obtains direction information of speaker by microphone array. Speaker spatial embedding is generated by xvector and s-vector derived from superdirective beamforming (SDB) which makes the embedding more robust. Specifically, we propose a novel multi-channel sequence-to-sequence neural network architecture named discriminative multi-stream neural network (DMSNet) which consists of attention superdirective beamforming (ASDB) block and Conformer encoder. The proposed ASDB is a self-adapted channel-wise block that extracts the latent spatial features of array audios by modeling interdependencies between channels. We explore DMSNet to address overlapped speech problem on multi-channel audio and achieve 93.53% accuracy on evaluation set. By performing DMSNet based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.