Multitask Detection of Speaker Changes, Overlapping Speech and Voice   Activity Using wav2vec 2.0

Marie Kune\v{s}ov\'a; Zbyn\v{e}k Zaj\'ic

arXiv:2210.14755·eess.AS·May 10, 2023

Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

Marie Kune\v{s}ov\'a, Zbyn\v{e}k Zaj\'ic

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that wav2vec 2.0 can be effectively used for multiple speech classification tasks, achieving state-of-the-art results in speaker change, overlapped speech, and voice activity detection, including a multitask system.

Contribution

The paper introduces a multitask system based on wav2vec 2.0 that outperforms previous methods on multiple speech classification tasks and provides publicly available implementation.

Findings

01

Surpasses previous results on speaker change detection across four corpora.

02

Achieves state-of-the-art performance on multitask speech classification.

03

Performs well even with out-of-domain training data.

Abstract

Self-supervised learning approaches have lately achieved great success on a broad spectrum of machine learning problems. In the field of speech processing, one of the most successful recent self-supervised models is wav2vec 2.0. In this paper, we explore the effectiveness of this model on three basic speech classification tasks: speaker change detection, overlapped speech detection, and voice activity detection. First, we concentrate on only one task -- speaker change detection -- where our proposed system surpasses the previously reported results on four different corpora, and achieves comparable performance even when trained on out-of-domain data from an artificially designed dataset. Then we expand our approach to tackle all three tasks in a single multitask system with state-of-the-art performance on the AMI corpus. The implementation of the algorithms in this paper is publicly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mkunes/w2v2_audioframeclassification
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques