Exploring Detection-based Method For Speaker Diarization @ Ego4D   Audio-only Diarization Challenge 2022

Jiahao Wang; Guo Chen; Yin-Dong Zheng; Tong Lu

arXiv:2211.08708·cs.SD·November 17, 2022

Exploring Detection-based Method For Speaker Diarization @ Ego4D Audio-only Diarization Challenge 2022

Jiahao Wang, Guo Chen, Yin-Dong Zheng, Tong Lu

PDF

Open Access

TL;DR

This paper presents a detection-based approach for speaker diarization using audio features and a detection network, achieving competitive results in the Ego4D challenge.

Contribution

It introduces a novel detection-based method for audio-only speaker diarization, combining feature extraction and proposal generation.

Findings

01

Achieved 53.85 DER on test data

02

Ranked 3rd in Ego4D challenge 2022

03

Validated effectiveness on challenge dataset

Abstract

We provide the technical report for Ego4D audio-only diarization challenge in ECCV 2022. Speaker diarization takes the audio streams as input and outputs the homogeneous segments according to the speaker's identity. It aims to solve the problem of "Who spoke when." In this paper, we explore a Detection-based method to tackle the audio-only speaker diarization task. Our method first extracts audio features by audio backbone and then feeds the feature to a detection-generate network to get the speaker proposals. Finally, after postprocessing, we can get the diarization results. The validation dataset validates this method, and our method achieves 53.85 DER on the test dataset. These results rank 3rd on the leaderboard of Ego4D audio-only diarization challenge 2022.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques