MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition

Hao Zhang; Zhan Zhuang; Xuehao Wang; Xiaodong Yang; Yu Zhang

arXiv:2505.20744·cs.CV·October 29, 2025

MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition

Hao Zhang, Zhan Zhuang, Xuehao Wang, Xiaodong Yang, Yu Zhang

PDF

Open Access

TL;DR

MoPFormer is a self-supervised Transformer-based framework that tokenizes wearable sensor signals into meaningful motion primitives, improving interpretability and cross-dataset generalization in human activity recognition.

Contribution

It introduces a novel motion-primitive tokenization and a self-supervised training scheme for enhanced interpretability and robustness in HAR models.

Findings

01

Outperforms state-of-the-art HAR methods on six benchmarks.

02

Demonstrates strong cross-dataset generalization.

03

Motion primitives improve interpretability and consistency across datasets.

Abstract

Human Activity Recognition (HAR) with wearable sensors is challenged by limited interpretability, which significantly impacts cross-dataset generalization. To address this challenge, we propose Motion-Primitive Transformer (MoPFormer), a novel self-supervised framework that enhances interpretability by tokenizing inertial measurement unit signals into semantically meaningful motion primitives and leverages a Transformer architecture to learn rich temporal representations. MoPFormer comprises two stages. The first stage is to partition multi-channel sensor streams into short segments and quantize them into discrete ``motion primitive'' codewords, while the second stage enriches those tokenized sequences through a context-aware embedding module and then processes them with a Transformer encoder. The proposed MoPFormer can be pre-trained using a masked motion-modeling objective that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Robotics and Automated Systems · Context-Aware Activity Recognition Systems

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing