Voice Aging with Audio-Visual Style Transfer
Justin Wilson, Sunyeong Park, Seunghye J. Wilson, Ming C. Lin

TL;DR
This paper introduces a novel voice aging technique using style transfer, inspired by face aging methods, which transforms a speaker's voice to sound older or younger while preserving identity, demonstrated via a mobile app.
Contribution
It extends style transfer methods from face to voice aging, combining CNN-based age classification with spectrogram transformation for realistic voice aging.
Findings
Successfully aged voices across different age groups
Maintained speaker identity in transformed voices
Implemented a mobile app for real-time voice aging
Abstract
Face aging techniques have used generative adversarial networks (GANs) and style transfer learning to transform one's appearance to look younger/older. Identity is maintained by conditioning these generative networks on a learned vector representation of the source content. In this work, we apply a similar approach to age a speaker's voice, referred to as voice aging. We first analyze the classification of a speaker's age by training a convolutional neural network (CNN) on the speaker's voice and face data from Common Voice and VoxCeleb datasets. We generate aged voices from style transfer to transform an input spectrogram to various ages and demonstrate our method on a mobile app.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Subtitles and Audiovisual Media
