Multimodal Contrastive Learning with Hard Negative Sampling for Human Activity Recognition
Hyeongju Choi, Apoorva Beedu, Irfan Essa

TL;DR
This paper introduces a novel multimodal contrastive learning approach with hard negative sampling for human activity recognition, improving feature representation and outperforming existing methods on benchmark datasets.
Contribution
It proposes a hard negative sampling method tailored for multimodal HAR, enhancing contrastive learning effectiveness with limited annotated data.
Findings
Outperforms state-of-the-art on UTD-MHAD dataset
Effective in limited data scenarios
Robust feature learning across modalities
Abstract
Human Activity Recognition (HAR) systems have been extensively studied by the vision and ubiquitous computing communities due to their practical applications in daily life, such as smart homes, surveillance, and health monitoring. Typically, this process is supervised in nature and the development of such systems requires access to large quantities of annotated data. However, the higher costs and challenges associated with obtaining good quality annotations have rendered the application of self-supervised methods an attractive option and contrastive learning comprises one such method. However, a major component of successful contrastive learning is the selection of good positive and negative samples. Although positive samples are directly obtainable, sampling good negative samples remain a challenge. As human activities can be recorded by several modalities like camera and IMU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Human Pose and Action Recognition · IoT-based Smart Home Systems
MethodsContrastive Learning
