Distilled Mid-Fusion Transformer Networks for Multi-Modal Human Activity Recognition
Jingcheng Li, Lina Yao, Binghao Li, Claude Sammut

TL;DR
This paper introduces DMFT, a knowledge distillation-based multi-modal transformer network that effectively fuses spatial-temporal features for human activity recognition, achieving high performance with reduced complexity suitable for edge deployment.
Contribution
The paper proposes a novel multi-modal mid-fusion transformer architecture with knowledge distillation, improving feature extraction and fusion efficiency for human activity recognition.
Findings
DMFT achieves competitive accuracy on public datasets.
The approach enhances robustness and scalability.
The student model reduces complexity for edge deployment.
Abstract
Human Activity Recognition is an important task in many human-computer collaborative scenarios, whilst having various practical applications. Although uni-modal approaches have been extensively studied, they suffer from data quality and require modality-specific feature engineering, thus not being robust and effective enough for real-world deployment. By utilizing various sensors, Multi-modal Human Activity Recognition could utilize the complementary information to build models that can generalize well. While deep learning methods have shown promising results, their potential in extracting salient multi-modal spatial-temporal features and better fusing complementary information has not been fully explored. Also, reducing the complexity of the multi-modal approach for edge deployment is another problem yet to resolve. To resolve the issues, a knowledge distillation-based Multi-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Human Pose and Action Recognition
MethodsKnowledge Distillation
