Speaker Diarization Based on Multi-channel Microphone Array in   Small-scale Meeting

Yuxuan Du; Ruohua Zhou

arXiv:2210.14644·cs.SD·October 27, 2022·1 cites

Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting

Yuxuan Du, Ruohua Zhou

PDF

Open Access

TL;DR

This paper introduces a multi-channel microphone array-based speaker diarization method for small meetings, utilizing spatial information and speech enhancement to significantly improve diarization accuracy.

Contribution

It proposes a novel diarization approach combining spatial vectors, speech enhancement, SRP-PHAT, and reclustering, specifically optimized for small-scale meetings with up to four speakers.

Findings

01

Achieved significant reduction in diarization error rate (DER).

02

Demonstrated effectiveness on the AMI corpus.

03

Enhanced speaker separation using spatial and clustering techniques.

Abstract

In the task of speaker diarization, the number of small-scale meetings accounts for a large proportion. When microphone arrays are employed as a recording device, its spatial information is usually ignored by most researchers. In this paper, inspired by the clustering method combining d-vector and microphone array spatial vector, we proposed a diarization method which using multi-channel microphone arrays for a meeting with no more than 4 speakers. We utilize speech enhancement to preprocess the audio from the microphone array. The Steered-Response Power Phase Transform (SRP-PHAT) algorithm are employed to get more accurate speakers, and apply the number of speakers to recluster the speech segments to achieve better performance. Finally, we fuse our system by DOVER-LAP to get the best result. We evaluated our system on the AMI corpus. Compared with the best experimental results so far,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis