USAD: End-to-End Human Activity Recognition via Diffusion Model with Spatiotemporal Attention

Hang Xiao; Ying Yu; Jiarui Li; Zhifan Yang; Haotian Tang; Hanyu Liu; Chao Li

arXiv:2507.02827·cs.CV·July 14, 2025

USAD: End-to-End Human Activity Recognition via Diffusion Model with Spatiotemporal Attention

Hang Xiao, Ying Yu, Jiarui Li, Zhifan Yang, Haotian Tang, Hanyu Liu, Chao Li

PDF

TL;DR

This paper introduces USAD, a novel end-to-end human activity recognition model combining diffusion-based data augmentation with multi-scale spatio-temporal attention mechanisms, achieving high accuracy and efficiency on lightweight devices.

Contribution

The paper presents a comprehensive approach integrating an unsupervised diffusion model for data augmentation with a multi-branch attention network for improved feature extraction in HAR.

Findings

01

Achieved over 98% accuracy on WISDM dataset.

02

Outperformed existing methods on PAMAP2 and OPPORTUNITY datasets.

03

Validated efficiency on embedded devices.

Abstract

The primary objective of human activity recognition (HAR) is to infer ongoing human actions from sensor data, a task that finds broad applications in health monitoring, safety protection, and sports analysis. Despite proliferating research, HAR still faces key challenges, including the scarcity of labeled samples for rare activities, insufficient extraction of high-level features, and suboptimal model performance on lightweight devices. To address these issues, this paper proposes a comprehensive optimization approach centered on multi-attention interaction mechanisms. First, an unsupervised, statistics-guided diffusion model is employed to perform data augmentation, thereby alleviating the problems of labeled data scarcity and severe class imbalance. Second, a multi-branch spatio-temporal interaction network is designed, which captures multi-scale features of sequential data through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.