Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
Julien Hauret, Malo Olivier, Thomas Joubaud, Christophe, Langrenne, Sarah Poir\'ee, V\'eronique Zimpfer, \'Eric Bavu

TL;DR
Vibravox is a comprehensive GDPR-compliant dataset of French speech and physiological sounds captured with multiple body-conduction sensors and airborne microphones, enabling research on speech recognition, enhancement, and verification.
Contribution
The paper introduces Vibravox, a novel dataset with diverse sensor recordings and annotations, facilitating advanced research in body-conduction audio processing.
Findings
Sensor-specific performance insights for speech tasks
Comparison of body-conduction and airborne microphone data
Evaluation of state-of-the-art models on new sensor data
Abstract
Vibravox is a dataset compliant with the General Data Protection Regulation (GDPR) containing audio recordings using five different body-conduction audio sensors: two in-ear microphones, two bone conduction vibration pickups, and a laryngophone. The dataset also includes audio data from an airborne microphone used as a reference. The Vibravox corpus contains 45 hours per sensor of speech samples and physiological sounds recorded by 188 participants under different acoustic conditions imposed by a high order ambisonics 3D spatializer. Annotations about the recording conditions and linguistic transcriptions are also included in the corpus. We conducted a series of experiments on various speech-related tasks, including speech recognition, speech enhancement, and speaker verification. These experiments were carried out using state-of-the-art models to evaluate and compare their performances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Cnam-LMSSC/vibravox_phonemizersmodel· ♡ 4♡ 4
- 🤗Cnam-LMSSC/vibravox_EBEN_modelsmodel· ♡ 5♡ 5
- 🤗Cnam-LMSSC/phonemizer_forehead_accelerometermodel· 3 dl· ♡ 23 dl♡ 2
- 🤗Cnam-LMSSC/phonemizer_headset_microphonemodel· 5 dl· ♡ 45 dl♡ 4
- 🤗Cnam-LMSSC/phonemizer_rigid_in_ear_microphonemodel· 2 dl· ♡ 22 dl♡ 2
- 🤗Cnam-LMSSC/phonemizer_soft_in_ear_microphonemodel· 4 dl· ♡ 24 dl♡ 2
- 🤗Cnam-LMSSC/phonemizer_temple_vibration_pickupmodel· 1 dl· ♡ 21 dl♡ 2
- 🤗Cnam-LMSSC/phonemizer_throat_microphonemodel· 5 dl· ♡ 25 dl♡ 2
- 🤗Cnam-LMSSC/EBEN_forehead_accelerometermodel· 9 dl· ♡ 29 dl♡ 2
- 🤗Cnam-LMSSC/EBEN_temple_vibration_pickupmodel· 20 dl· ♡ 220 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsSparse Evolutionary Training
