Two-Stage Voice Anonymization for Enhanced Privacy
Francesco Nespoli, Daniel Barreda, Joerg Bitzer, Patrick A. Naylor

TL;DR
This paper introduces a two-stage voice anonymization system combining a state-of-the-art model with zero-shot voice conversion to enhance privacy while maintaining speech pitch, and proposes a new metric for evaluation.
Contribution
It presents a novel two-stage anonymization pipeline that improves privacy preservation and utility in speech data, along with a new evaluation metric.
Findings
Strong privacy preservation demonstrated
Effective pitch retention in anonymized speech
New metric for privacy-utility evaluation
Abstract
In recent years, the need for privacy preservation when manipulating or storing personal data, including speech , has become a major issue. In this paper, we present a system addressing the speaker-level anonymization problem. We propose and evaluate a two-stage anonymization pipeline exploiting a state-of-the-art anonymization model described in the Voice Privacy Challenge 2022 in combination with a zero-shot voice conversion architecture able to capture speaker characteristics from a few seconds of speech. We show this architecture can lead to strong privacy preservation while preserving pitch information. Finally, we propose a new compressed metric to evaluate anonymization systems in privacy scenarios with different constraints on privacy and utility.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders
