Efficient Sensor Fusion for Gesture Recognition on Resource-Constrained Devices
Pietro Bartoli, Christian Veronesi, Tommaso Bondini, Andrea Giudici, Franco Zappa

TL;DR
This paper introduces a lightweight, privacy-preserving gesture recognition system for smart eyewear that fuses low-resolution ToF and IR sensors using a compact CNN on microcontrollers, achieving high accuracy and low power consumption.
Contribution
It presents a novel sensor fusion approach and a CNN architecture optimized for resource-constrained devices in gesture recognition applications.
Findings
Achieved 92.3% accuracy on a 7-gesture dataset.
System runs with only 6,343 parameters on microcontrollers.
Demonstrated millisecond inference latency with 50 mW power consumption.
Abstract
Gesture recognition is a cornerstone of Human-Computer Interaction (HCI) for smart eyewear, enabling natural and device-free control in augmented reality environments. Traditional vision-based approaches face significant challenges regarding power consumption, computational latency, and user privacy. This paper proposes a lightweight, privacy-preserving gesture recognition system based on the fusion of low-resolution Time-of-Flight (ToF) and Infrared (IR) thermal sensors. We used an 8 times 8 multizone ToF sensor (VL53L8CH) and an 8 times 8 IR array (AMG8833) to capture complementary depth and thermal cues. A compact Convolutional Neural Network (CNN) with a specialized grouped-convolution architecture is designed to fuse these modalities efficiently on a microcontroller (MCU). Experimental results on a custom dataset of 7 static gestures, validated via k-fold cross-validation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
