ConvMixFormer- A Resource-efficient Convolution Mixer for   Transformer-based Dynamic Hand Gesture Recognition

Mallika Garg; Debashis Ghosh; and Pyari Mohan Pradhan

arXiv:2411.07118·cs.CV·December 24, 2024

ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition

Mallika Garg, Debashis Ghosh, and Pyari Mohan Pradhan

PDF

Open Access 1 Repo

TL;DR

ConvMixFormer is a resource-efficient convolutional transformer architecture designed for dynamic hand gesture recognition, replacing heavy self-attention with convolutional token mixers to reduce complexity and improve efficiency.

Contribution

The paper introduces ConvMixFormer, a novel convolutional mixer-based transformer that reduces computational cost and parameters while capturing local features for gesture recognition.

Findings

01

Achieved state-of-the-art results on NVidia and Briareo datasets.

02

Model uses nearly half the parameters of traditional transformers.

03

Demonstrated superior efficiency and accuracy with multimodal inputs.

Abstract

Transformer models have demonstrated remarkable success in many domains such as natural language processing (NLP) and computer vision. With the growing interest in transformer-based architectures, they are now utilized for gesture recognition. So, we also explore and devise a novel ConvMixFormer architecture for dynamic hand gestures. The transformers use quadratic scaling of the attention features with the sequential data, due to which these models are computationally complex and heavy. We have considered this drawback of the transformer and designed a resource-efficient model that replaces the self-attention in the transformer with the simple convolutional layer-based token mixer. The computational cost and the parameters used for the convolution-based mixer are comparatively less than the quadratic self-attention. Convolution-mixer helps the model capture the local spatial features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mallikagarg/convmixformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Robotics and Automated Systems · Gaze Tracking and Assistive Technology

MethodsSoftmax · Attention Is All You Need