TL;DR
This paper presents DRAM, a neural model that predicts natural avatar body poses during conversations by modeling both individual and interactive behaviors using adaptive attention, improving realism and interaction quality.
Contribution
Introduces DRAM, a novel neural architecture that combines intrapersonal and interpersonal dynamics with adaptive attention for end-to-end pose forecasting in avatars during dyadic conversations.
Findings
DRAM outperforms non-adaptive models in naturalness of generated poses.
Adaptive attention effectively captures interpersonal dynamics.
User study confirms improved realism of avatar behaviors.
Abstract
Non verbal behaviours such as gestures, facial expressions, body posture, and para-linguistic cues have been shown to complement or clarify verbal messages. Hence to improve telepresence, in form of an avatar, it is important to model these behaviours, especially in dyadic interactions. Creating such personalized avatars not only requires to model intrapersonal dynamics between a avatar's speech and their body pose, but it also needs to model interpersonal dynamics with the interlocutor present in the conversation. In this paper, we introduce a neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar. We evaluate our proposed model on dyadic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
