Voice Aging with Audio-Visual Style Transfer

Justin Wilson; Sunyeong Park; Seunghye J. Wilson; Ming C. Lin

arXiv:2110.02411·cs.SD·October 7, 2021

Voice Aging with Audio-Visual Style Transfer

Justin Wilson, Sunyeong Park, Seunghye J. Wilson, Ming C. Lin

PDF

Open Access

TL;DR

This paper introduces a novel voice aging technique using style transfer, inspired by face aging methods, which transforms a speaker's voice to sound older or younger while preserving identity, demonstrated via a mobile app.

Contribution

It extends style transfer methods from face to voice aging, combining CNN-based age classification with spectrogram transformation for realistic voice aging.

Findings

01

Successfully aged voices across different age groups

02

Maintained speaker identity in transformed voices

03

Implemented a mobile app for real-time voice aging

Abstract

Face aging techniques have used generative adversarial networks (GANs) and style transfer learning to transform one's appearance to look younger/older. Identity is maintained by conditioning these generative networks on a learned vector representation of the source content. In this work, we apply a similar approach to age a speaker's voice, referred to as voice aging. We first analyze the classification of a speaker's age by training a convolutional neural network (CNN) on the speaker's voice and face data from Common Voice and VoxCeleb datasets. We generate aged voices from style transfer to transform an input spectrogram to various ages and demonstrate our method on a mobile app.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Subtitles and Audiovisual Media