WS-IMUBench: Can Weakly Supervised Methods from Audio, Image, and Video Be Adapted for IMU-based Temporal Action Localization?

Pei Li; Jiaxi Yin; Lei Ouyang; Shihan Pan; Ge Wang; Han Ding; Fei Wang

arXiv:2602.01850·cs.CV·February 3, 2026

WS-IMUBench: Can Weakly Supervised Methods from Audio, Image, and Video Be Adapted for IMU-based Temporal Action Localization?

Pei Li, Jiaxi Yin, Lei Ouyang, Shihan Pan, Ge Wang, Han Ding, Fei Wang

PDF

Open Access

TL;DR

This paper systematically evaluates the transferability of weakly supervised localization methods from audio, image, and video to IMU-based temporal action localization, highlighting modality-dependent effectiveness and challenges with short actions.

Contribution

It introduces WS-IMUBench, a benchmark for weakly supervised IMU-TAL, and provides comprehensive evaluation of existing methods across multiple datasets, guiding future research directions.

Findings

01

Temporal-domain methods are more stable than image-derived approaches.

02

Weak supervision is effective on datasets with longer actions.

03

Short actions and proposal quality are major failure modes.

Abstract

IMU-based Human Activity Recognition (HAR) has enabled a wide range of ubiquitous computing applications, yet its dominant clip classification paradigm cannot capture the rich temporal structure of real-world behaviors. This motivates a shift toward IMU Temporal Action Localization (IMU-TAL), which predicts both action categories and their start/end times in continuous streams. However, current progress is strongly bottlenecked by the need for dense, frame-level boundary annotations, which are costly and difficult to scale. To address this bottleneck, we introduce WS-IMUBench, a systematic benchmark study of weakly supervised IMU-TAL (WS-IMU-TAL) under only sequence-level labels. Rather than proposing a new localization algorithm, we evaluate how well established weakly supervised localization paradigms from audio, image, and video transfer to IMU-TAL under only sequence-level labels.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Multimodal Machine Learning Applications