SecureSpeech: Prompt-based Speaker and Content Protection
Belinda Soh Hui Hui, Xiaoxiao Miao, Xin Wang

TL;DR
SecureSpeech introduces a prompt-based speech generation method that anonymizes both speaker identity and spoken content, enhancing privacy while maintaining speech quality and content fidelity.
Contribution
The paper presents a novel prompt-based pipeline for dual speaker and content anonymization in speech synthesis, addressing privacy concerns in speech data.
Findings
Achieves significant privacy protection against speaker re-identification
Maintains high speech quality and content retention
Analyzes bias introduced by different speaker descriptions
Abstract
Given the increasing privacy concerns from identity theft and the re-identification of speakers through content in the speech field, this paper proposes a prompt-based speech generation pipeline that ensures dual anonymization of both speaker identity and spoken content. This is addressed through 1) generating a speaker identity unlinkable to the source speaker, controlled by descriptors, and 2) replacing sensitive content within the original text using a name entity recognition model and a large language model. The pipeline utilizes the anonymized speaker identity and text to generate high-fidelity, privacy-friendly speech via a text-to-speech synthesis model. Experimental results demonstrate an achievement of significant privacy protection while maintaining a decent level of content retention and audio quality. This paper also investigates the impact of varying speaker descriptions on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Authorship Attribution and Profiling · Digital Media Forensic Detection
