Contrastive Learning for Multimodal Human Activity Recognition with Limited Labeled Data
Long Jing, Zhixiong Yang, Yajun Zhang, Xinlong Feng

TL;DR
This paper introduces CLMM, a contrastive learning framework that enhances multimodal human activity recognition with limited labeled data through a two-stage training process.
Contribution
The paper presents a novel two-stage contrastive learning approach with a CNN-DiffTransformer encoder and dual-branch architecture for improved recognition.
Findings
CLMM outperforms state-of-the-art methods in accuracy.
CLMM demonstrates faster convergence in experiments.
The framework effectively utilizes limited labeled data.
Abstract
Human activity recognition serves as the foundation for various emerging applications. In recent years, researchers have used collaborative sensing of multi-source sensors to capture complex and dynamic human activities. However, multimodal human activity sensing typically encounters highly heterogeneous data across modalities and label scarcity, resulting in an application gap between existing solutions and real-world needs. In this paper, we propose CLMM, a general contrastive learning framework for human activity recognition that achieves effective multimodal recognition with limited labeled data. CLMM employs a novel two-stage training strategy. In the first stage, CLMM employs a CNN-DiffTransformer encoder to capture cross-modal shared information by extracting local and global features. Meanwhile, a hard-positive samples weighting algorithm enhances gradient propagation to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
