EASY: Emotion-aware Speaker Anonymization via Factorized Distillation

Jixun Yao; Hexin Liu; Eng Siong Chng; Lei Xie

arXiv:2505.15004·eess.AS·June 2, 2025

EASY: Emotion-aware Speaker Anonymization via Factorized Distillation

Jixun Yao, Hexin Liu, Eng Siong Chng, Lei Xie

PDF

Open Access

TL;DR

EASY is a novel speaker anonymization framework that disentangles speaker identity, linguistic content, and emotional state to enhance privacy while preserving speech emotion and content.

Contribution

It introduces a sequential disentanglement process with factorized distillation to preserve emotion and content, advancing privacy protection in speech anonymization.

Findings

01

Outperforms baseline systems in privacy protection

02

Preserves emotional state and linguistic content effectively

03

Demonstrates robustness on VoicePrivacy Challenge datasets

Abstract

Emotion plays a significant role in speech interaction, conveyed through tone, pitch, and rhythm, enabling the expression of feelings and intentions beyond words to create a more personalized experience. However, most existing speaker anonymization systems employ parallel disentanglement methods, which only separate speech into linguistic content and speaker identity, often neglecting the preservation of the original emotional state. In this study, we introduce EASY, an emotion-aware speaker anonymization framework. EASY employs a novel sequential disentanglement process to disentangle speaker identity, linguistic content, and emotional representation, modeling each speech attribute in distinct subspaces through a factorized distillation approach. By independently constraining speaker identity and emotional representation, EASY minimizes information leakage, enhancing privacy protection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing