Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices
Abner Hernandez, Paula Andrea P\'erez-Toro, Juan Camilo, V\'asquez-Correa, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

TL;DR
This study explores voice conversion using self-supervised speech representations to anonymize speech data while preserving recognition accuracy and extracting relevant speech features for health diagnostics.
Contribution
It demonstrates that self-supervised speech models can effectively anonymize voices with minimal impact on speech recognition and retain key speech characteristics for analysis.
Findings
Converted voices maintain low word error rates (~1%)
Speaker verification accuracy significantly decreases after anonymization
Speech features relevant to health diagnostics can be extracted from anonymized voices
Abstract
Collecting speech data is an important step in training speech recognition systems and other speech-based machine learning models. However, the issue of privacy protection is an increasing concern that must be addressed. The current study investigates the use of voice conversion as a method for anonymizing voices. In particular, we train several voice conversion models using self-supervised speech representations including Wav2Vec2.0, Hubert and UniSpeech. Converted voices retain a low word error rate within 1% of the original voice. Equal error rate increases from 1.52% to 46.24% on the LibriSpeech test set and from 3.75% to 45.84% on speakers from the VCTK corpus which signifies degraded performance on speaker verification. Lastly, we conduct experiments on dysarthric speech data to show that speech features relevant to articulation, prosody, phonation and phonology can be extracted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders
