InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech   Activity Detection and Speaker Gender Segmentation

David Doukhan; Christine Maertens; William Le Personnic and; Ludovic Speroni; Reda Dehak

arXiv:2406.04429·eess.AS·June 10, 2024

InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender Segmentation

David Doukhan, Christine Maertens, William Le Personnic and, Ludovic Speroni, Reda Dehak

PDF

Open Access 1 Repo

TL;DR

InaGVAD is a comprehensive French audiovisual corpus with detailed annotations for speech activity and speaker gender, designed to support research in media monitoring and speaker segmentation with benchmark evaluations.

Contribution

The paper introduces InaGVAD, a new challenging French audiovisual corpus with extensive annotations and benchmarks for speech activity detection and speaker gender segmentation.

Findings

01

Diverse annotation categories including overlap and speaker traits.

02

Benchmark results for six VAD systems showing varied performance.

03

Competitive speaker gender segmentation results using transfer learning.

Abstract

InaGVAD is an audio corpus collected from 10 French radio and 18 TV channels categorized into 4 groups: generalist radio, music radio, news TV, and generalist TV. It contains 277 1-minute-long annotated recordings aimed at representing the acoustic diversity of French audiovisual programs and was primarily designed to build systems able to monitor men's and women's speaking time in media. inaGVAD is provided with Voice Activity Detection (VAD) and Speaker Gender Segmentation (SGS) annotations extended with overlap, speaker traits (gender, age, voice quality), and 10 non-speech event categories. Annotation distributions are detailed for each channel category. This dataset is partitioned into a 1h development and a 3h37 test subset, allowing fair and reproducible system evaluation. A benchmark of 6 freely available VAD software is presented, showing diverse abilities based on channel and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ina-foss/InaGVAD
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing