OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches

Pietro Bonazzi; Youssef Ahmed; Daniel Eckert; Andrea Ronco; Junjie Zeng; Dengxin Dai; Michele Magno

arXiv:2605.04791·cs.HC·May 8, 2026

OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches

Pietro Bonazzi, Youssef Ahmed, Daniel Eckert, Andrea Ronco, Junjie Zeng, Dengxin Dai, Michele Magno

PDF

TL;DR

OpenWatch introduces a comprehensive multimodal benchmark for wrist-based gesture recognition on smartwatches, providing new methodologies and insights into modality fusion, model adaptation, and resource efficiency.

Contribution

This work presents the first open-access multimodal benchmark for smartwatch gesture recognition, along with novel methods like MixToken and NormWear-Lora, and empirical guidance for model design.

Findings

01

PPG signals improve predictive accuracy by +12.5% F1-score.

02

MixToken outperforms finetuned foundation models with 90% vs 66% F1-score.

03

Task-specific architectures are more accurate and memory-efficient.

Abstract

Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we introduce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a commercial smartwatch. It contains over 10 hours of Inertial Measurement Unit (IMU) and Photoplethysmography (PPG) data across 50 participants and a vocabulary of 59 labelled gesture sequences. Furthermore, we present a subject-independent evaluation protocol including traditional and deep learning methods for time-series classification. On top of this, we develop two novel methodologies for hand-gesture recognition: (i) MixToken, a task-specific mixture-of-experts that fuses per-channel IMU filterbank features with cross-channel statistical tokens through learned logit mixing,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.