Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving   Speaker Anonymization

Zexin Cai; Henry Li Xinyuan; Ashi Garg; Leibny Paola Garc\'ia-Perera,; Kevin Duh; Sanjeev Khudanpur; Nicholas Andrews; Matthew Wiesner

arXiv:2409.03655·eess.AS·September 6, 2024

Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization

Zexin Cai, Henry Li Xinyuan, Ashi Garg, Leibny Paola Garc\'ia-Perera,, Kevin Duh, Sanjeev Khudanpur, Nicholas Andrews, Matthew Wiesner

PDF

Open Access

TL;DR

This paper investigates the challenge of balancing speaker anonymization with emotion preservation in speech, revealing that current methods excel at one but not both, and proposing the need for in-domain emotion recognition.

Contribution

The study introduces various speaker anonymization pipelines and analyzes their effectiveness in preserving emotion, highlighting the trade-offs and proposing future directions with in-domain emotion recognizers.

Findings

01

Approaches excel at either anonymization or emotion preservation, but not both.

02

Training emotion-based speaker verification is feasible but semi-effective.

03

Separating speaker identity from emotion remains a significant challenge.

Abstract

Advances in speech technology now allow unprecedented access to personally identifiable information through speech. To protect such information, the differential privacy field has explored ways to anonymize speech while preserving its utility, including linguistic and paralinguistic aspects. However, anonymizing speech while maintaining emotional state remains challenging. We explore this problem in the context of the VoicePrivacy 2024 challenge. Specifically, we developed various speaker anonymization pipelines and find that approaches either excel at anonymization or preserving emotion state, but not both simultaneously. Achieving both would require an in-domain emotion recognizer. Additionally, we found that it is feasible to train a semi-effective speaker verification system using only emotion representations, demonstrating the challenge of separating these two modalities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis