Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization
Natalia Tomashenko, Emmanuel Vincent, Marc Tommasi

TL;DR
This paper examines how speech timing features affect speaker verification and voice anonymization, highlighting the need to modify phonetic durations to enhance privacy and prevent speaker identification.
Contribution
It introduces metrics based on phoneme durations for speaker verification and demonstrates their role in speaker identity leakage in both original and anonymized speech.
Findings
Phoneme durations leak speaker identity information.
Speaker's speech rate and phonetic durations are crucial for privacy.
Modifying phonetic durations can improve voice anonymization.
Abstract
In this paper, we investigate the impact of speech temporal dynamics in application to automatic speaker verification and speaker voice anonymization tasks. We propose several metrics to perform automatic speaker verification based only on phoneme durations. Experimental results demonstrate that phoneme durations leak some speaker information and can reveal speaker identity from both original and anonymized speech. Thus, this work emphasizes the importance of taking into account the speaker's speech rate and, more importantly, the speaker's phonetic duration characteristics, as well as the need to modify them in order to develop anonymization systems with strong privacy protection capacity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
