Controlled AutoEncoders to Generate Faces from Voices

Hao Liang; Lulan Yu; Guikang Xu; Bhiksha Raj; Rita Singh

arXiv:2107.07988·cs.CV·July 19, 2021

Controlled AutoEncoders to Generate Faces from Voices

Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh

PDF

Open Access

TL;DR

This paper introduces a controlled autoencoder framework that morphs facial images based on voice input, leveraging learned voice-face correlations to explore how voices influence perceived facial features.

Contribution

It presents a novel guided autoencoder with a gating controller that conditionally modifies faces according to voice characteristics, enabling voice-driven face morphing.

Findings

01

Effective face morphing guided by voice demonstrated

02

Human evaluation confirms perceptual alignment

03

Face retrieval experiments support model accuracy

Abstract

Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face have to change in order to be perceived as the originator of a source voice?" With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper. Our framework includes a guided autoencoder that converts one face to another, controlled by a unique model-conditioning component called a gating controller which modifies the reconstructed face based on input voice…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing