OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches
Pietro Bonazzi, Youssef Ahmed, Daniel Eckert, Andrea Ronco, Junjie Zeng, Dengxin Dai, Michele Magno

TL;DR
OpenWatch introduces a comprehensive multimodal benchmark for wrist-based gesture recognition on smartwatches, providing new methodologies and insights into modality fusion, model adaptation, and resource efficiency.
Contribution
This work presents the first open-access multimodal benchmark for smartwatch gesture recognition, along with novel methods like MixToken and NormWear-Lora, and empirical guidance for model design.
Findings
PPG signals improve predictive accuracy by +12.5% F1-score.
MixToken outperforms finetuned foundation models with 90% vs 66% F1-score.
Task-specific architectures are more accurate and memory-efficient.
Abstract
Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we introduce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a commercial smartwatch. It contains over 10 hours of Inertial Measurement Unit (IMU) and Photoplethysmography (PPG) data across 50 participants and a vocabulary of 59 labelled gesture sequences. Furthermore, we present a subject-independent evaluation protocol including traditional and deep learning methods for time-series classification. On top of this, we develop two novel methodologies for hand-gesture recognition: (i) MixToken, a task-specific mixture-of-experts that fuses per-channel IMU filterbank features with cross-channel statistical tokens through learned logit mixing,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
