Overlapped speech and gender detection with WavLM pre-trained features

Martin Lebourdais; Marie Tahon; Antoine Laurent; Sylvain Meignier

arXiv:2209.04167·cs.SD·September 12, 2022

Overlapped speech and gender detection with WavLM pre-trained features

Martin Lebourdais, Marie Tahon, Antoine Laurent, Sylvain Meignier

PDF

TL;DR

This paper presents a system using WavLM pre-trained features for overlapped speech and gender detection in French audiovisual media, achieving state-of-the-art results and high accuracy, aiding social science research.

Contribution

It introduces a novel application of WavLM features for simultaneous overlapped speech and gender detection in French media datasets.

Findings

01

State-of-the-art F1-score for overlapped speech detection on DIHARD

02

97.9% accuracy in gender detection on French broadcast news

03

Effective use of WavLM features in multi-task speech analysis

Abstract

This article focuses on overlapped speech and gender detection in order to study interactions between women and men in French audiovisual media (Gender Equality Monitoring project). In this application context, we need to automatically segment the speech signal according to speakers gender, and to identify when at least two speakers speak at the same time. We propose to use WavLM model which has the advantage of being pre-trained on a huge amount of speech data, to build an overlapped speech detection (OSD) and a gender detection (GD) systems. In this study, we use two different corpora. The DIHARD III corpus which is well adapted for the OSD task but lack gender information. The ALLIES corpus fits with the project application context. Our best OSD system is a Temporal Convolutional Network (TCN) with WavLM pre-trained features as input, which reaches a new state-of-the-art F1-score…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.