LLAniMAtion: LLAMA Driven Gesture Animation
Jonathan Windle, Iain Matthews, Sarah Taylor

TL;DR
This paper introduces a novel approach for gesture animation in characters using LLAMA2 language model features extracted from text, outperforming traditional audio-driven methods and enabling gesture generation without audio input.
Contribution
The study demonstrates that LLAMA2 features alone can effectively generate synchronized gestures, surpassing audio-based methods, and explores multimodal integration for improved gesture synthesis.
Findings
LLAMA2 features outperform audio features in gesture generation
Combining audio and text features yields no significant improvement over text alone
LLAMA2 enables gesture synthesis without any audio input
Abstract
Co-speech gesturing is an important modality in conversation, providing context and social cues. In character animation, appropriate and synchronised gestures add realism, and can make interactive agents more engaging. Historically, methods for automatically generating gestures were predominantly audio-driven, exploiting the prosodic and speech-related content that is encoded in the audio signal. In this paper we instead experiment with using LLM features for gesture generation that are extracted from text using LLAMA2. We compare against audio features, and explore combining the two modalities in both objective tests and a user study. Surprisingly, our results show that LLAMA2 features on their own perform significantly better than audio features and that including both modalities yields no significant difference to using LLAMA2 features in isolation. We demonstrate that the LLAMA2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Motion and Animation · Human Pose and Action Recognition
