Adapting General Disentanglement-Based Speaker Anonymization for   Enhanced Emotion Preservation

Xiaoxiao Miao; Yuxiang Zhang; Xin Wang; Natalia Tomashenko; Donny; Cheng Lock Soh; Ian Mcloughlin

arXiv:2408.05928·cs.SD·April 24, 2025

Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation

Xiaoxiao Miao, Yuxiang Zhang, Xin Wang, Natalia Tomashenko, Donny, Cheng Lock Soh, Ian Mcloughlin

PDF

Open Access

TL;DR

This paper enhances speaker anonymization systems to better preserve emotional cues by integrating emotion embeddings and applying an emotion compensation post-processing step, balancing privacy and emotional fidelity.

Contribution

It introduces two novel strategies—using pre-trained emotion embeddings and an emotion compensation method—to improve emotion preservation in speaker anonymization.

Findings

01

Emotion embeddings aid in preserving emotional cues.

02

Emotion compensation improves emotional fidelity with slight privacy trade-offs.

03

Strategies are adaptable for other paralinguistic attributes.

Abstract

A general disentanglement-based speaker anonymization system typically separates speech into content, speaker, and prosody features using individual encoders. This paper explores how to adapt such a system when a new speech attribute, for example, emotion, needs to be preserved to a greater extent. While existing systems are good at anonymizing speaker embeddings, they are not designed to preserve emotion. Two strategies for this are examined. First, we show that integrating emotion embeddings from a pre-trained emotion encoder can help preserve emotional cues, even though this approach slightly compromises privacy protection. Alternatively, we propose an emotion compensation strategy as a post-processing step applied to anonymized speaker embeddings. This conceals the original speaker's identity and reintroduces the emotional traits lost during speaker embedding anonymization.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing