# Virtual agents as a scalable tool for diverse, robust gesture recognition

**Authors:** Lisa Loy, James P. Trujillo, Floris Roelofsen

PMC · DOI: 10.3758/s13428-025-02914-w · 2026-01-16

## TL;DR

This paper introduces virtual agents as a scalable solution for training and testing gesture recognition algorithms, overcoming data limitations and enabling controlled experiments.

## Contribution

The novel use of virtual agents for gesture recognition in multimodal communication research is proposed and demonstrated.

## Key findings

- A model trained on virtual agents achieved 85.9% accuracy under optimal conditions.
- Accuracy dropped to 71.6% with background clutter and reduced lighting.
- The model achieved 72-95% accuracy when tested on human images.

## Abstract

Gesture recognition technology is a popular area of research, offering applications in many fields, including behaviour research, human–computer interaction (HCI), medical research, and surveillance culture, among others. However, the large quantity of data needed to train a recognition algorithm is not always available, and differences between the training set and one’s own research data in factors such as recording conditions and participant characteristics may hinder transferability. To address these issues, we propose training and testing recognition algorithms on virtual agents, a tool that has not yet been used for this purpose in multimodal communication research. We provide an example use case with step-by-step instructions, using mocap data to animate a virtual agent and create customised lighting conditions, backgrounds, and camera angles, creating a virtual agent-only dataset to train and test a gesture recognition algorithm. This approach also allows us to assess the impact of particular features, such as background and lighting. Our best-performing model in optimal background and lighting conditions achieved accuracy of 85.9%. When introducing background clutter and reduced lighting, the accuracy dropped to 71.6%. When testing the virtual agent-trained model on images of humans, the accuracy of target handshape classification ranged from 72% to 95%. The results suggest that training an algorithm on artificial data (1) is a resourceful, convenient, and effective way to customise algorithms, (2) potentially addresses issues of data sparsity, and (3) can be used to assess the impact of many contextual and environmental factors that would not be feasible to systematically assess using human data

The online version contains supplementary material available at 10.3758/s13428-025-02914-w.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12811268/full.md

---
Source: https://tomesphere.com/paper/PMC12811268