ACT2G: Attention-based Contrastive Learning for Text-to-Gesture   Generation

Hitoshi Teshima; Naoki Wake; Diego Thomas; Yuta Nakashima; Hiroshi; Kawasaki; Katsushi Ikeuchi

arXiv:2309.16162·cs.HC·September 29, 2023

ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation

Hitoshi Teshima, Naoki Wake, Diego Thomas, Yuta Nakashima, Hiroshi, Kawasaki, Katsushi Ikeuchi

PDF

Open Access

TL;DR

This paper introduces ACT2G, a novel attention-based contrastive learning method for generating content-representative gestures from text, improving realism and diversity in avatar communication.

Contribution

It proposes a new contrastive learning approach that aligns text and gesture features in a shared latent space, enabling content-aware gesture generation.

Findings

01

User study shows ACT2G outperforms existing methods

02

Generated gestures better reflect text content

03

Wide variation in gestures from same text demonstrated

Abstract

Recent increase of remote-work, online meeting and tele-operation task makes people find that gesture for avatars and communication robots is more important than we have thought. It is one of the key factors to achieve smooth and natural communication between humans and AI systems and has been intensively researched. Current gesture generation methods are mostly based on deep neural network using text, audio and other information as the input, however, they generate gestures mainly based on audio, which is called a beat gesture. Although the ratio of the beat gesture is more than 70% of actual human gestures, content based gestures sometimes play an important role to make avatars more realistic and human-like. In this paper, we propose a attention-based contrastive learning for text-to-gesture (ACT2G), where generated gestures represent content of the text by estimating attention weight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Speech and dialogue systems · Hearing Impairment and Communication