InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation
Sreehari Rajan, Kunal Bhosikar, Charu Sharma

TL;DR
InteracTalker is a novel framework that integrates co-speech gesture generation with human-object interaction, utilizing a unified embedding space and adaptive fusion to produce realistic, controllable full-body motions in interactive scenarios.
Contribution
We introduce InteracTalker, a comprehensive framework that combines speech, gesture, and object interactions through a multi-stage training process and a new dataset, advancing realistic human motion synthesis.
Findings
Outperforms prior gesture generation methods in realism and object-awareness
Successfully unifies speech-driven gestures and object interactions in a single model
Produces highly realistic, flexible, and controllable full-body motions
Abstract
Generating realistic human motions that naturally respond to both spoken language and physical objects is crucial for interactive digital experiences. Current methods, however, address speech-driven gestures or object interactions independently, limiting real-world applicability due to a lack of integrated, comprehensive datasets. To overcome this, we introduce InteracTalker, a novel framework that seamlessly integrates prompt-based object-aware interactions with co-speech gesture generation. We achieve this by employing a multi-stage training process to learn a unified motion, speech, and prompt embedding space. To support this, we curate a rich human-object interaction dataset, formed by augmenting an existing text-to-motion dataset with detailed object interaction annotations. Our framework utilizes a Generalized Motion Adaptation Module that enables independent training, adapting to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Social Robot Interaction and HRI · Multimodal Machine Learning Applications
