Distilling Knowledge from CNN-Transformer Models for Enhanced Human Action Recognition
Hamid Ahmadabadi, Omid Nejati Manzari, Ahmad Ayatollahi

TL;DR
This paper explores enhancing human action recognition by distilling knowledge from CNN and Transformer models, demonstrating significant accuracy improvements through a novel combination of local and global feature extraction techniques.
Contribution
It introduces a knowledge distillation framework using Transformer-based student models and CNN teacher models, improving performance in human action recognition tasks.
Findings
Knowledge distillation improves accuracy and mAP in action recognition.
Transformer-based models effectively capture global image dependencies.
Combining CNN and Transformer features enhances recognition performance.
Abstract
This paper presents a study on improving human action recognition through the utilization of knowledge distillation, and the combination of CNN and ViT models. The research aims to enhance the performance and efficiency of smaller student models by transferring knowledge from larger teacher models. The proposed method employs a Transformer vision network as the student model, while a convolutional network serves as the teacher model. The teacher model extracts local image features, whereas the student model focuses on global features using an attention mechanism. The Vision Transformer (ViT) architecture is introduced as a robust framework for capturing global dependencies in images. Additionally, advanced variants of ViT, namely PVT, Convit, MVIT, Swin Transformer, and Twins, are discussed, highlighting their contributions to computer vision tasks. The ConvNeXt model is introduced as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Residual Connection · Byte Pair Encoding · Dense Connections · Layer Normalization · Stochastic Depth · Label Smoothing
