Controlling your Attributes in Voice

Xuyuan Li; Zengqiang Shang.Li Wang; Pengyuan Zhang

arXiv:2501.01674·cs.SD·January 6, 2025

Controlling your Attributes in Voice

Xuyuan Li, Zengqiang Shang.Li Wang, Pengyuan Zhang

PDF

Open Access

TL;DR

This paper introduces a novel GAN-based autoencoder and a two-stage voice conversion method to control speaker attributes like age and gender in speech, without needing parallel data, while maintaining speech quality and identity.

Contribution

It presents a new approach for attribute control in speech generation using non-parallel data, combining a variational autoencoder with a two-stage voice conversion model.

Findings

01

Effective manipulation of speaker age and gender in speech

02

Preservation of speech quality and speaker identity

03

Attribute control achieved without parallel data

Abstract

Attribute control in generative tasks aims to modify personal attributes, such as age and gender while preserving the identity information in the source sample. Although significant progress has been made in controlling facial attributes in image generation, similar approaches for speech generation remain largely unexplored. This letter proposes a novel method for controlling speaker attributes in speech without parallel data. Our approach consists of two main components: a GAN-based speaker representation variational autoencoder that extracts speaker identity and attributes from speaker vector, and a two-stage voice conversion model that captures the natural expression of speaker attributes in speech. Experimental results show that our proposed method not only achieves attribute control at the speaker representation level but also enables manipulation of the speaker age and gender at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems