Learning Co-Speech Gesture Representations in Dialogue through Contrastive Learning: An Intrinsic Evaluation
Esam Ghaleb, Bulat Khaertdinov, Wim Pouw, Marlou Rasenberg, Judith, Holler, Asl{\i} \"Ozy\"urek, Raquel Fern\'andez

TL;DR
This paper introduces a self-supervised contrastive learning method to develop meaningful co-speech gesture representations from dialogue data, effectively capturing gesture variability and relation to speech.
Contribution
It proposes a novel multimodal contrastive learning approach for gesture representation learning grounded in speech, with thorough intrinsic evaluation and interpretability analysis.
Findings
Gesture representations correlate positively with human similarity judgments.
Learned features can recover interpretable gesture characteristics.
The approach captures dialogue interaction dynamics effectively.
Abstract
In face-to-face dialogues, the form-meaning relationship of co-speech gestures varies depending on contextual factors such as what the gestures refer to and the individual characteristics of speakers. These factors make co-speech gesture representation learning challenging. How can we learn meaningful gestures representations considering gestures' variability and relationship with speech? This paper tackles this challenge by employing self-supervised contrastive learning techniques to learn gesture representations from skeletal and speech information. We propose an approach that includes both unimodal and multimodal pre-training to ground gesture representations in co-occurring speech. For training, we utilize a face-to-face dialogue dataset rich with representational iconic gestures. We conduct thorough intrinsic evaluations of the learned representations through comparison with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
