Efficient Sensor Fusion for Gesture Recognition on Resource-Constrained Devices

Pietro Bartoli; Christian Veronesi; Tommaso Bondini; Andrea Giudici; Franco Zappa

arXiv:2605.13462·cs.LG·May 14, 2026

Efficient Sensor Fusion for Gesture Recognition on Resource-Constrained Devices

Pietro Bartoli, Christian Veronesi, Tommaso Bondini, Andrea Giudici, Franco Zappa

PDF

TL;DR

This paper introduces a lightweight, privacy-preserving gesture recognition system for smart eyewear that fuses low-resolution ToF and IR sensors using a compact CNN on microcontrollers, achieving high accuracy and low power consumption.

Contribution

It presents a novel sensor fusion approach and a CNN architecture optimized for resource-constrained devices in gesture recognition applications.

Findings

01

Achieved 92.3% accuracy on a 7-gesture dataset.

02

System runs with only 6,343 parameters on microcontrollers.

03

Demonstrated millisecond inference latency with 50 mW power consumption.

Abstract

Gesture recognition is a cornerstone of Human-Computer Interaction (HCI) for smart eyewear, enabling natural and device-free control in augmented reality environments. Traditional vision-based approaches face significant challenges regarding power consumption, computational latency, and user privacy. This paper proposes a lightweight, privacy-preserving gesture recognition system based on the fusion of low-resolution Time-of-Flight (ToF) and Infrared (IR) thermal sensors. We used an 8 times 8 multizone ToF sensor (VL53L8CH) and an 8 times 8 IR array (AMG8833) to capture complementary depth and thermal cues. A compact Convolutional Neural Network (CNN) with a specialized grouped-convolution architecture is designed to fuse these modalities efficiently on a microcontroller (MCU). Experimental results on a custom dataset of 7 static gestures, validated via k-fold cross-validation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.