Reconstructing faces from voices

Yandong Wen; Rita Singh; Bhiksha Raj

arXiv:1905.10604·cs.SD·June 4, 2019·1 cites

Reconstructing faces from voices

Yandong Wen, Rita Singh, Bhiksha Raj

PDF

Open Access 1 Repo

TL;DR

This paper introduces a GAN-based method to reconstruct faces from voice recordings, demonstrating that generated faces can match biometric speaker characteristics with accuracy surpassing chance levels.

Contribution

The paper presents a novel GAN framework for face reconstruction from voice, effectively linking audio and visual biometric features for the first time.

Findings

01

Generated faces match speaker biometric traits

02

Matching accuracy significantly exceeds chance

03

GAN-based approach effectively links voice and face features

Abstract

Voice profiling aims at inferring various human parameters from their speech, e.g. gender, age, etc. In this paper, we address the challenge posed by a subtask of voice profiling - reconstructing someone's face from their voice. The task is designed to answer the question: given an audio clip spoken by an unseen person, can we picture a face that has as many common elements, or associations as possible with the speaker, in terms of identity? To address this problem, we propose a simple but effective computational framework based on generative adversarial networks (GANs). The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set. We evaluate the performance of the network by leveraging a closely related task - cross-modal matching. The results show that our model is able to generate faces that match several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cmu-mlsp/reconstructing_faces_from_voices
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis