TL;DR
This paper presents a probabilistic, interlocutor-aware facial gesture generation method for conversational agents, leveraging multi-modal cues and extending a recent motion synthesis model to produce more natural and contextually appropriate non-verbal behaviors.
Contribution
The paper introduces a novel probabilistic approach that incorporates multi-modal interlocutor cues into facial gesture synthesis, extending MoGlow for more natural and adaptive agent behaviors.
Findings
Model effectively uses multi-modal cues to generate appropriate gestures.
Probabilistic method reduces repetitive motions compared to deterministic approaches.
Subjective evaluation confirms improved naturalness and appropriateness of generated gestures.
Abstract
To enable more natural face-to-face interactions, conversational agents need to adapt their behavior to their interlocutors. One key aspect of this is generation of appropriate non-verbal behavior for the agent, for example facial gestures, here defined as facial expressions and head movements. Most existing gesture-generating systems do not utilize multi-modal cues from the interlocutor when synthesizing non-verbal behavior. Those that do, typically use deterministic methods that risk producing repetitive and non-vivid motions. In this paper, we introduce a probabilistic method to synthesize interlocutor-aware facial gestures - represented by highly expressive FLAME parameters - in dyadic conversations. Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
