Speech-driven Animation with Meaningful Behaviors
Najmeh Sadoughi, Carlos Busso

TL;DR
This paper introduces a novel speech-driven animation method that combines rule-based and data-driven approaches using a dynamic Bayesian network to generate meaningful, synchronized gestures reflecting speech content.
Contribution
It proposes a constrained DBN model that incorporates discourse functions and prototypical behaviors to produce more natural and meaningful agent movements.
Findings
The constrained model outperforms unconstrained models in evaluations.
The approach effectively synchronizes gestures with speech.
It captures meaningful behaviors aligned with discourse and prototypical cues.
Abstract
Conversational agents (CAs) play an important role in human computer interaction. Creating believable movements for CAs is challenging, since the movements have to be meaningful and natural, reflecting the coupling between gestures and speech. Studies in the past have mainly relied on rule-based or data-driven approaches. Rule-based methods focus on creating meaningful behaviors conveying the underlying message, but the gestures cannot be easily synchronized with speech. Data-driven approaches, especially speech-driven models, can capture the relationship between speech and gestures. However, they create behaviors disregarding the meaning of the message. This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
