TL;DR
This paper introduces a novel multimodal self-supervised learning framework called CMC-CMKM for human activity recognition, leveraging cross-modal knowledge to improve recognition accuracy across various scenarios.
Contribution
The paper proposes a flexible framework that enhances contrastive self-supervised learning by incorporating cross-modal knowledge mining for multimodal human activity recognition.
Findings
Significantly outperforms unimodal and multimodal baselines in experiments.
Achieves competitive performance with supervised methods.
Effective in fully-supervised, retrieval, and semi-supervised scenarios.
Abstract
Human Activity Recognition is a field of research where input data can take many forms. Each of the possible input modalities describes human behaviour in a different way, and each has its own strengths and weaknesses. We explore the hypothesis that leveraging multiple modalities can lead to better recognition. Since manual annotation of input data is expensive and time-consuming, the emphasis is made on self-supervised methods which can learn useful feature representations without any ground truth labels. We extend a number of recent contrastive self-supervised approaches for the task of Human Activity Recognition, leveraging inertial and skeleton data. Furthermore, we propose a flexible, general-purpose framework for performing multimodal self-supervised learning, named Contrastive Multiview Coding with Cross-Modal Knowledge Mining (CMC-CMKM). This framework exploits modality-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInfoNCE · Contrastive Multiview Coding
