TL;DR
This paper introduces SAKDN, a framework that enhances video-based action recognition by adaptively distilling knowledge from wearable sensors transformed into images, effectively addressing modality differences and improving recognition accuracy.
Contribution
The paper proposes a novel semantics-aware adaptive knowledge distillation framework that fuses multi-modal sensor data and employs graph-guided loss for improved action recognition.
Findings
Effective knowledge transfer from wearable sensors to video models.
Improved recognition accuracy on multiple datasets.
Novel fusion and loss mechanisms for multi-modal learning.
Abstract
Existing vision-based action recognition is susceptible to occlusion and appearance variations, while wearable sensors can alleviate these challenges by capturing human motion with one-dimensional time-series signal. For the same action, the knowledge learned from vision sensors and wearable sensors, may be related and complementary. However, there exists significantly large modality difference between action data captured by wearable-sensor and vision-sensor in data dimension, data distribution and inherent information content. In this paper, we propose a novel framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) by adaptively transferring and distilling the knowledge from multiple wearable sensors. The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
