Why disentanglement-based speaker anonymization systems fail at preserving emotions?
\"Unal Ege Gaznepoglu, Nils Peters

TL;DR
This paper investigates why current disentanglement-based speaker anonymization systems fail to preserve emotional content, identifying the lack of emotion information in intermediate representations as a key factor.
Contribution
The study provides a comprehensive evaluation of a state-of-the-art system, revealing the main causes of emotion loss and highlighting the impact of speaker embeddings and synthesis artifacts.
Findings
Lack of emotion information in intermediate representations is the main cause.
Speaker embeddings learned in a generative context significantly affect emotion preservation.
Synthesis artifacts bias emotion recognition towards anger.
Abstract
Disentanglement-based speaker anonymization involves decomposing speech into a semantically meaningful representation, altering the speaker embedding, and resynthesizing a waveform using a neural vocoder. State-of-the-art systems of this kind are known to remove emotion information. Possible reasons include mode collapse in GAN-based vocoders, unintended modeling and modification of emotions through speaker embeddings, or excessive sanitization of the intermediate representation. In this paper, we conduct a comprehensive evaluation of a state-of-the-art speaker anonymization system to understand the underlying causes. We conclude that the main reason is the lack of emotion-related information in the intermediate representation. The speaker embeddings also have a high impact, if they are learned in a generative context. The vocoder's out-of-distribution performance has a smaller impact.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Authorship Attribution and Profiling · Hate Speech and Cyberbullying Detection
