spINAch: A Diachronic Corpus of French Broadcast Speech Controlled for Speakers' Age and Gender
Simon Devauchelle, David Doukhan, R\'emi Uro, Lucas Ondel Yang, Valentin Pelloin, Olympia Imbert-Br\'eg\'eg\`ere, V\'eronique Lefort, K\'evin Picard, Emeline Seignobos, Albert Rilliard

TL;DR
spINAch is a comprehensive, balanced diachronic corpus of French broadcast speech spanning 60 years, enabling phonetic and linguistic studies of language evolution related to age and gender.
Contribution
This work introduces a large, balanced, and automatically transcribed corpus of French speech from 1955 to 2015, facilitating diachronic phonetic research.
Findings
Voice pitch evolution over time does not differ by gender.
Neutralization of /a/-/$a$/ opposition observed during the period.
Corpus enables studies of phonetic change in Parisian French.
Abstract
We present spINAch, a large diachronic corpus of French speech from radio and television archives, balanced by speakers' gender, age (20-95 years old), and spanning 60 years from 1955 to 2015. The dataset includes over 320 hours of recordings from more than two thousand speakers. The methodology for building the corpus is described, focusing on the quality of collected samples in acoustic terms. The data were automatically transcribed and phonetically aligned to allow studies at a phonemic level. More than 3 million oral vowels have been analyzed to propose their fundamental frequency and formants. The corpus, available to the community for research purposes, is valuable for describing the evolution of Parisian French through the representation of gender and age. The presented analyses also demonstrate that the diachronic nature of the corpus allows the observation of various phonetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Linguistic Variation and Morphology · Voice and Speech Disorders
