Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation

Jasmine Moreira

arXiv:2603.25863·cs.CV·March 30, 2026

Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation

Jasmine Moreira

PDF

TL;DR

This paper introduces a CNN-based method for real-time dynamic LIBRAS gesture recognition using skeletal keypoints, achieving high accuracy in home automation applications.

Contribution

It combines MediaPipe hand keypoints with a CNN on spatiotemporal matrices for effective gesture recognition without recurrent networks.

Findings

01

95% accuracy under low-light conditions

02

92% accuracy under normal lighting

03

Effective for real-time device control

Abstract

This paper proposes a method for dynamic hand gesture recognition based on the composition of two models: the MediaPipe Hand Landmarker, responsible for extracting 21 skeletal keypoints of the hand, and a convolutional neural network (CNN) trained to classify gestures from a spatiotemporal matrix representation of dimensions 90 by 21 of those keypoints. The method is applied to the recognition of LIBRAS (Brazilian Sign Language) gestures for device control in a home automation system, covering 11 classes of static and dynamic gestures. For real-time inference, a sliding window with temporal frame triplication is used, enabling continuous recognition without recurrent networks. Tests achieved 95\% accuracy under low-light conditions and 92\% under normal lighting. The results indicate that the approach is effective, although systematic experiments with greater user diversity are needed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.