Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs
Vitor Fortes Rey, Lala Shakti Swarup Ray, Xia Qingxin, Kaishun Wu,, Paul Lukowicz

TL;DR
This paper introduces Multi$^3$Net, a multi-modal framework that leverages video data and contrastive learning to generate synthetic IMU data, significantly improving wearable HAR performance, especially for subtle activities.
Contribution
The paper presents a novel multi-modal, multitask, contrastive learning approach to synthesize IMU data from videos, enhancing HAR accuracy in recognizing fine-grained activities.
Findings
Models trained with synthetic IMU data outperform existing methods.
The approach improves recognition of subtle, fine-grained activities.
Synthetic data generation enhances HAR performance in real-world scenarios.
Abstract
Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to the poor quality of synthetic IMU data and its limited efficacy in subtle, fine-grained motions. In this paper, we propose MultiNet, our novel multi-modal, multitask, and contrastive-based framework approach to address the issue of limited data. Our pretraining procedure uses videos from online repositories, aiming to learn joint representations of text, pose, and IMU simultaneously. By employing video data and contrastive learning, our method seeks to enhance wearable HAR performance, especially in recognizing subtle activities.Our experimental findings validate the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Robotics and Automated Systems · Social Robot Interaction and HRI
