MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human   Activity Recognition

Ziqi Gao; Yuntao Wang; Jianguo Chen; Junliang Xing; Shwetak Patel; Xin; Liu; Yuanchun Shi

arXiv:2210.09222·cs.CV·October 13, 2023·6 cites

MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human Activity Recognition

Ziqi Gao, Yuntao Wang, Jianguo Chen, Junliang Xing, Shwetak Patel, Xin, Liu, Yuanchun Shi

PDF

Open Access 1 Repo

TL;DR

This paper introduces MMTSA, an efficient multimodal neural network for human activity recognition that fuses RGB and IMU data, achieving higher accuracy and lower computational cost on public datasets.

Contribution

The paper presents MMTSA, a novel neural architecture that transforms IMU data into images, employs sparse sampling, and uses inter-segment attention for improved multimodal fusion in HAR.

Findings

01

Achieved 11.13% higher cross-subject F1-score on MMAct dataset.

02

Demonstrated superior efficiency with lower computational load and latency.

03

Proved effectiveness of multimodal fusion and sparse sampling in HAR.

Abstract

Multimodal sensors provide complementary information to develop accurate machine-learning methods for human activity recognition (HAR), but introduce significantly higher computational load, which reduces efficiency. This paper proposes an efficient multimodal neural architecture for HAR using an RGB camera and inertial measurement units (IMUs) called Multimodal Temporal Segment Attention Network (MMTSA). MMTSA first transforms IMU sensor data into a temporal and structure-preserving gray-scale image using the Gramian Angular Field (GAF), representing the inherent properties of human activities. MMTSA then applies a multimodal sparse sampling method to reduce data redundancy. Lastly, MMTSA adopts an inter-segment attention module for efficient multimodal fusion. Using three well-established public datasets, we evaluated MMTSA's effectiveness and efficiency in HAR. Results show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thuhci/MMTSA
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Human Pose and Action Recognition · Advanced Technologies in Various Fields