Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation
Jasmine Moreira

TL;DR
This paper introduces a CNN-based method for real-time dynamic LIBRAS gesture recognition using skeletal keypoints, achieving high accuracy in home automation applications.
Contribution
It combines MediaPipe hand keypoints with a CNN on spatiotemporal matrices for effective gesture recognition without recurrent networks.
Findings
95% accuracy under low-light conditions
92% accuracy under normal lighting
Effective for real-time device control
Abstract
This paper proposes a method for dynamic hand gesture recognition based on the composition of two models: the MediaPipe Hand Landmarker, responsible for extracting 21 skeletal keypoints of the hand, and a convolutional neural network (CNN) trained to classify gestures from a spatiotemporal matrix representation of dimensions 90 by 21 of those keypoints. The method is applied to the recognition of LIBRAS (Brazilian Sign Language) gestures for device control in a home automation system, covering 11 classes of static and dynamic gestures. For real-time inference, a sliding window with temporal frame triplication is used, enabling continuous recognition without recurrent networks. Tests achieved 95\% accuracy under low-light conditions and 92\% under normal lighting. The results indicate that the approach is effective, although systematic experiments with greater user diversity are needed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
