Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis
M. Hamza Mughal, Rishabh Dabral, Merel C.J. Scholman, Vera Demberg,, Christian Theobalt

TL;DR
This paper introduces RAG-Gesture, a diffusion-based method that uses retrieval-augmented generation to produce semantically meaningful co-speech gestures grounded in linguistic knowledge, without requiring training.
Contribution
The paper presents a novel RAG-based gesture synthesis approach that retrieves exemplar gestures and integrates them into a diffusion model, enabling semantic gesture generation without training.
Findings
Outperforms recent gesture generation methods in semantic relevance
Allows user control over the influence of retrieved exemplars
Demonstrates effective semantic grounding in generated gestures
Abstract
Non-verbal communication often comprises of semantically rich gestures that help convey the meaning of an utterance. Producing such semantic co-speech gestures has been a major challenge for the existing neural systems that can generate rhythmic beat gestures, but struggle to produce semantically meaningful gestures. Therefore, we present RAG-Gesture, a diffusion-based gesture generation approach that leverages Retrieval Augmented Generation (RAG) to produce natural-looking and semantically rich gestures. Our neuro-explicit gesture generation approach is designed to produce semantic gestures grounded in interpretable linguistic knowledge. We achieve this by using explicit domain knowledge to retrieve exemplar motions from a database of co-speech gestures. Once retrieved, we then inject these semantic exemplar gestures into our diffusion-based gesture generation pipeline using DDIM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Robotics and Automated Systems · Speech and dialogue systems
