Prompt-to-Gesture: Measuring the Capabilities of Image-to-Video Deictic Gesture Generation

Hassan Ali; Doreen Jirak; Luca M\"uller; Stefan Wermter

arXiv:2604.14953·cs.CV·April 17, 2026

Prompt-to-Gesture: Measuring the Capabilities of Image-to-Video Deictic Gesture Generation

Hassan Ali, Doreen Jirak, Luca M\"uller, Stefan Wermter

PDF

1 Datasets

TL;DR

This paper explores the use of prompt-based image-to-video models to generate realistic deictic gestures, augmenting limited real data and improving downstream gesture recognition performance.

Contribution

It introduces a pipeline for synthetic gesture data generation from few samples, demonstrating its effectiveness and variability benefits for machine learning tasks.

Findings

01

Synthetic gestures closely match real ones in visual quality

02

Generated data adds meaningful variability and novelty

03

Models perform better with mixed real and synthetic data

Abstract

Gesture recognition research, unlike NLP, continues to face acute data scarcity, with progress constrained by the need for costly human recordings or image processing approaches that cannot generate authentic variability in the gestures themselves. Recent advancements in image-to-video foundation models have enabled the generation of photorealistic, semantically rich videos guided by natural language. These capabilities open up new possibilities for creating effort-free synthetic data, raising the critical question of whether video Generative AI models can augment and complement traditional human-generated gesture data. In this paper, we introduce and analyze prompt-based video generation to construct a realistic deictic gestures dataset and rigorously evaluate its effectiveness for downstream tasks. We propose a data generation pipeline that produces deictic gestures from a small…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

sano90/prompt-to-gesture
dataset· 14 dl
14 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.