Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control

Masato Murata; Koichi Miyazaki; Tomoki Koriyama

arXiv:2507.03382·cs.SD·July 8, 2025

Speaker-agnostic Emotion Vector for Cross-speaker Emotion Intensity Control

Masato Murata, Koichi Miyazaki, Tomoki Koriyama

PDF

TL;DR

This paper introduces a speaker-agnostic emotion vector for cross-speaker emotion intensity control in speech synthesis, ensuring speaker consistency and high speech quality even for unseen speakers.

Contribution

The paper proposes a novel speaker-agnostic emotion vector that captures shared emotional expressions, improving cross-speaker emotion control over prior methods.

Findings

01

Successful cross-speaker emotion intensity control with maintained speaker consistency

02

Effective in unseen speaker scenarios

03

High speech quality and controllability achieved

Abstract

Cross-speaker emotion intensity control aims to generate emotional speech of a target speaker with desired emotion intensities using only their neutral speech. A recently proposed method, emotion arithmetic, achieves emotion intensity control using a single-speaker emotion vector. Although this prior method has shown promising results in the same-speaker setting, it lost speaker consistency in the cross-speaker setting due to mismatches between the emotion vector of the source and target speakers. To overcome this limitation, we propose a speaker-agnostic emotion vector designed to capture shared emotional expressions across multiple speakers. This speaker-agnostic emotion vector is applicable to arbitrary speakers. Experimental results demonstrate that the proposed method succeeds in cross-speaker emotion intensity control while maintaining speaker consistency, speech quality, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.