InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation

Sreehari Rajan; Kunal Bhosikar; Charu Sharma

arXiv:2512.12664·cs.CV·December 16, 2025

InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation

Sreehari Rajan, Kunal Bhosikar, Charu Sharma

PDF

Open Access

TL;DR

InteracTalker is a novel framework that integrates co-speech gesture generation with human-object interaction, utilizing a unified embedding space and adaptive fusion to produce realistic, controllable full-body motions in interactive scenarios.

Contribution

We introduce InteracTalker, a comprehensive framework that combines speech, gesture, and object interactions through a multi-stage training process and a new dataset, advancing realistic human motion synthesis.

Findings

01

Outperforms prior gesture generation methods in realism and object-awareness

02

Successfully unifies speech-driven gestures and object interactions in a single model

03

Produces highly realistic, flexible, and controllable full-body motions

Abstract

Generating realistic human motions that naturally respond to both spoken language and physical objects is crucial for interactive digital experiences. Current methods, however, address speech-driven gestures or object interactions independently, limiting real-world applicability due to a lack of integrated, comprehensive datasets. To overcome this, we introduce InteracTalker, a novel framework that seamlessly integrates prompt-based object-aware interactions with co-speech gesture generation. We achieve this by employing a multi-stage training process to learn a unified motion, speech, and prompt embedding space. To support this, we curate a rich human-object interaction dataset, formed by augmenting an existing text-to-motion dataset with detailed object interaction annotations. Our framework utilizes a Generalized Motion Adaptation Module that enables independent training, adapting to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Social Robot Interaction and HRI · Multimodal Machine Learning Applications