Enhancing Inertial Hand based HAR through Joint Representation of   Language, Pose and Synthetic IMUs

Vitor Fortes Rey; Lala Shakti Swarup Ray; Xia Qingxin; Kaishun Wu,; Paul Lukowicz

arXiv:2406.01316·cs.CV·July 30, 2024

Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs

Vitor Fortes Rey, Lala Shakti Swarup Ray, Xia Qingxin, Kaishun Wu,, Paul Lukowicz

PDF

Open Access

TL;DR

This paper introduces Multi$^3$Net, a multi-modal framework that leverages video data and contrastive learning to generate synthetic IMU data, significantly improving wearable HAR performance, especially for subtle activities.

Contribution

The paper presents a novel multi-modal, multitask, contrastive learning approach to synthesize IMU data from videos, enhancing HAR accuracy in recognizing fine-grained activities.

Findings

01

Models trained with synthetic IMU data outperform existing methods.

02

The approach improves recognition of subtle, fine-grained activities.

03

Synthetic data generation enhances HAR performance in real-world scenarios.

Abstract

Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to the poor quality of synthetic IMU data and its limited efficacy in subtle, fine-grained motions. In this paper, we propose Multi $^{3}$ Net, our novel multi-modal, multitask, and contrastive-based framework approach to address the issue of limited data. Our pretraining procedure uses videos from online repositories, aiming to learn joint representations of text, pose, and IMU simultaneously. By employing video data and contrastive learning, our method seeks to enhance wearable HAR performance, especially in recognizing subtle activities.Our experimental findings validate the effectiveness of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Robotics and Automated Systems · Social Robot Interaction and HRI