Lightweight Multi-Scale Framework for Human Pose and Action Classification
Alireza Saber, Mohammad-Mehdi Hosseini, Amirreza Fateh, Mansoor Fateh, Vahid Abolghasemi

TL;DR
This paper introduces a lightweight deep learning framework for human pose and action classification that achieves high accuracy with a small model size.
Contribution
The novel contribution is a modular attention-based architecture with a Swin Transformer backbone and three attention modules for effective multi-scale feature fusion.
Findings
The model achieves 90.40% accuracy on the 6-class Yoga-82 dataset.
It outperforms state-of-the-art methods on Stanford 40 Actions with 94.28% accuracy.
The model maintains high performance with only 0.79 million parameters.
Abstract
Human pose classification, along with related tasks such as action recognition, is a crucial area in deep learning due to its wide range of applications in assisting human activities. Despite significant progress, it remains a challenging problem because of high inter-class similarity, dataset noise, and the large variability in human poses. In this paper, we propose a lightweight yet highly effective modular attention-based architecture for human pose classification, built upon a Swin Transformer backbone for robust multi-scale feature extraction. The proposed design integrates the Spatial Attention module, the Context-Aware Channel Attention Module, and a novel Dual Weighted Cross Attention module, enabling effective fusion of spatial and channel-wise cues. Additionally, explainable AI techniques are employed to improve the reliability and interpretability of the model. We train and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Action Observation and Synchronization · Robot Manipulation and Learning
