ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition
Mallika Garg, Debashis Ghosh, and Pyari Mohan Pradhan

TL;DR
ConvMixFormer is a resource-efficient convolutional transformer architecture designed for dynamic hand gesture recognition, replacing heavy self-attention with convolutional token mixers to reduce complexity and improve efficiency.
Contribution
The paper introduces ConvMixFormer, a novel convolutional mixer-based transformer that reduces computational cost and parameters while capturing local features for gesture recognition.
Findings
Achieved state-of-the-art results on NVidia and Briareo datasets.
Model uses nearly half the parameters of traditional transformers.
Demonstrated superior efficiency and accuracy with multimodal inputs.
Abstract
Transformer models have demonstrated remarkable success in many domains such as natural language processing (NLP) and computer vision. With the growing interest in transformer-based architectures, they are now utilized for gesture recognition. So, we also explore and devise a novel ConvMixFormer architecture for dynamic hand gestures. The transformers use quadratic scaling of the attention features with the sequential data, due to which these models are computationally complex and heavy. We have considered this drawback of the transformer and designed a resource-efficient model that replaces the self-attention in the transformer with the simple convolutional layer-based token mixer. The computational cost and the parameters used for the convolution-based mixer are comparatively less than the quadratic self-attention. Convolution-mixer helps the model capture the local spatial features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Robotics and Automated Systems · Gaze Tracking and Assistive Technology
MethodsSoftmax · Attention Is All You Need
