InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender Segmentation
David Doukhan, Christine Maertens, William Le Personnic and, Ludovic Speroni, Reda Dehak

TL;DR
InaGVAD is a comprehensive French audiovisual corpus with detailed annotations for speech activity and speaker gender, designed to support research in media monitoring and speaker segmentation with benchmark evaluations.
Contribution
The paper introduces InaGVAD, a new challenging French audiovisual corpus with extensive annotations and benchmarks for speech activity detection and speaker gender segmentation.
Findings
Diverse annotation categories including overlap and speaker traits.
Benchmark results for six VAD systems showing varied performance.
Competitive speaker gender segmentation results using transfer learning.
Abstract
InaGVAD is an audio corpus collected from 10 French radio and 18 TV channels categorized into 4 groups: generalist radio, music radio, news TV, and generalist TV. It contains 277 1-minute-long annotated recordings aimed at representing the acoustic diversity of French audiovisual programs and was primarily designed to build systems able to monitor men's and women's speaking time in media. inaGVAD is provided with Voice Activity Detection (VAD) and Speaker Gender Segmentation (SGS) annotations extended with overlap, speaker traits (gender, age, voice quality), and 10 non-speech event categories. Annotation distributions are detailed for each channel category. This dataset is partitioned into a 1h development and a 3h37 test subset, allowing fair and reproducible system evaluation. A benchmark of 6 freely available VAD software is presented, showing diverse abilities based on channel and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
