Controlled AutoEncoders to Generate Faces from Voices
Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh

TL;DR
This paper introduces a controlled autoencoder framework that morphs facial images based on voice input, leveraging learned voice-face correlations to explore how voices influence perceived facial features.
Contribution
It presents a novel guided autoencoder with a gating controller that conditionally modifies faces according to voice characteristics, enabling voice-driven face morphing.
Findings
Effective face morphing guided by voice demonstrated
Human evaluation confirms perceptual alignment
Face retrieval experiments support model accuracy
Abstract
Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face have to change in order to be perceived as the originator of a source voice?" With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper. Our framework includes a guided autoencoder that converts one face to another, controlled by a unique model-conditioning component called a gating controller which modifies the reconstructed face based on input voice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing
