MoBind: Motion Binding for Fine-Grained IMU-Video Pose Alignment

Duc Duy Nguyen; Tat-Jun Chin; Minh Hoai

arXiv:2602.19004·cs.CV·February 24, 2026

MoBind: Motion Binding for Fine-Grained IMU-Video Pose Alignment

Duc Duy Nguyen, Tat-Jun Chin, Minh Hoai

PDF

Open Access

TL;DR

MoBind introduces a hierarchical contrastive learning framework that effectively aligns IMU signals with video-based skeletal motion, enabling precise cross-modal retrieval, synchronization, and action recognition.

Contribution

It presents a novel multi-level contrastive approach for fine-grained IMU-video alignment, decomposing full-body motion into local parts for improved semantic and temporal accuracy.

Findings

01

Outperforms baselines on multiple datasets

02

Achieves robust fine-grained temporal alignment

03

Preserves semantic consistency across modalities

Abstract

We aim to learn a joint representation between inertial measurement unit (IMU) signals and 2D pose sequences extracted from video, enabling accurate cross-modal retrieval, temporal synchronization, subject and body-part localization, and action recognition. To this end, we introduce MoBind, a hierarchical contrastive learning framework designed to address three challenges: (1) filtering out irrelevant visual background, (2) modeling structured multi-sensor IMU configurations, and (3) achieving fine-grained, sub-second temporal alignment. To isolate motion-relevant cues, MoBind aligns IMU signals with skeletal motion sequences rather than raw pixels. We further decompose full-body motion into local body-part trajectories, pairing each with its corresponding IMU to enable semantically grounded multi-sensor alignment. To capture detailed temporal correspondence, MoBind employs a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Balance, Gait, and Falls Prevention