Asynchronous Voice Anonymization Using Adversarial Perturbation On   Speaker Embedding

Rui Wang; Liping Chen; Kong AiK Lee; Zhen-Hua Ling

arXiv:2406.08200·cs.SD·November 13, 2024

Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding

Rui Wang, Liping Chen, Kong AiK Lee, Zhen-Hua Ling

PDF

Open Access

TL;DR

This paper introduces a novel asynchronous voice anonymization method that uses adversarial perturbation on speaker embeddings to obscure speaker identity from machine recognition while maintaining human perception.

Contribution

It proposes a speaker disentanglement framework with adversarial perturbation for voice anonymization, balancing privacy and perceptual quality, which is a new approach in the field.

Findings

01

Speaker attributes were obscured in 60.71% of utterances.

02

The method effectively balances anonymization and human perception.

03

Experiments on LibriSpeech dataset validate the approach.

Abstract

Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the asynchronous voice anonymization. To this end, a speech generation framework incorporating a speaker disentanglement mechanism is employed to generate the anonymized speech. The speaker attributes are altered through adversarial perturbation applied on the speaker embedding, while human perception is preserved by controlling the intensity of perturbation. Experiments conducted on the LibriSpeech dataset showed that the speaker attributes were obscured with their human perception preserved for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing

MethodsFocus