Naturalistic Head Motion Generation from Speech
Trisha Mittal, Zakaria Aldeneh, Masha Fedzechkina, Anurag Ranjan,, Barry-John Theobald

TL;DR
This paper investigates the perceptual quality of generated head motions for speech in conversational agents, revealing that current objective metrics do not align well with human perception and highlighting the need for better evaluation methods.
Contribution
The study demonstrates the variability in perceptual quality of head motions generated by models and critiques the effectiveness of existing objective metrics.
Findings
Generated head motions vary in perceptual quality.
Current objective metrics poorly correlate with human perception.
Diverse head motions can have different perceptual acceptability.
Abstract
Synthesizing natural head motion to accompany speech for an embodied conversational agent is necessary for providing a rich interactive experience. Most prior works assess the quality of generated head motion by comparing them against a single ground-truth using an objective metric. Yet there are many plausible head motion sequences to accompany a speech utterance. In this work, we study the variation in the perceptual quality of head motions sampled from a generative model. We show that, despite providing more diverse head motions, the generative model produces motions with varying degrees of perceptual quality. We finally show that objective metrics commonly used in previous research do not accurately reflect the perceptual quality of generated head motions. These results open an interesting avenue for future work to investigate better objective metrics that correlate with human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Social Robot Interaction and HRI · Human Motion and Animation
