Gesture Generation from Trimodal Context for Humanoid Robots
Shiyi Tang, Christian Dondrup

TL;DR
This paper presents a method for generating natural, diverse co-speech gestures for humanoid robots using tri-modal inputs, improving gesture realism, alignment, and style diversity in HRI.
Contribution
It adapts and applies a tri-modal gesture generation approach to robots, enhancing naturalness, style diversity, and speech alignment in robot gestures.
Findings
Gestures were successfully transferred to robots with diverse styles.
Generated gestures showed significant correlation with speech content.
Participants preferred and distinguished between different gesture styles.
Abstract
Natural co-speech gestures are essential components to improve the experience of Human-robot interaction (HRI). However, current gesture generation approaches have many limitations of not being natural, not aligning with the speech and content, or the lack of diverse speaker styles. Therefore, this work aims to repoduce the work by Yoon et,al generating natural gestures in simulation based on tri-modal inputs and apply this to a robot. During evaluation, ``motion variance'' and ``Frechet Gesture Distance (FGD)'' is employed to evaluate the performance objectively. Then, human participants were recruited to subjectively evaluate the gestures. Results show that the movements in that paper have been successfully transferred to the robot and the gestures have diverse styles and are correlated with the speech. Moreover, there is a significant likeability and style difference between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Social Robot Interaction and HRI · Robotics and Automated Systems
