Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?
Cho-Ying Wu, Chin-Cheng Hsu, Ulrich Neumann

TL;DR
This paper investigates whether 3D face geometry can be inferred from voices, proposing a new dataset and analysis framework to explore the physiological correlation between voice and face structure.
Contribution
It introduces Voxceleb-3D dataset and a cross-modal analysis framework for understanding voice-to-face geometry relationships, including supervised and unsupervised learning methods.
Findings
Voice and face geometry are correlated, consistent with neuroscience.
Face geometry can be partially reconstructed from voices.
The framework enables explainable cross-modal perception analysis.
Abstract
This work digs into a root question in human perception: can face geometry be gleaned from one's voices? Previous works that study this question only adopt developments in image synthesis and convert voices into face images to show correlations, but working on the image domain unavoidably involves predicting attributes that voices cannot hint, including facial textures, hairstyles, and backgrounds. We instead investigate the ability to reconstruct 3D faces to concentrate on only geometry, which is much more physiologically grounded. We propose our analysis framework, Cross-Modal Perceptionist, under both supervised and unsupervised learning. First, we construct a dataset, Voxceleb-3D, which extends Voxceleb and includes paired voices and face meshes, making supervised learning possible. Second, we use a knowledge distillation mechanism to study whether face geometry can still be gleaned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Face Recognition and Perception
MethodsKnowledge Distillation
