AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild
Baiyu Chen, Zechen Li, Wilson Wongso, Lihuan Li, Xiachong Lin, Hao Xue, Benjamin Tag, Flora Salim

TL;DR
AnyMo is a geometry-aware, setup-agnostic framework that models human motion using synthetic data and aligns motion tokens with language models, enabling robust zero-shot recognition, retrieval, and captioning across diverse datasets.
Contribution
It introduces a novel physics-grounded synthetic signal generation and a graph encoder aligned with language models for versatile human motion understanding.
Findings
Improves zero-shot activity recognition accuracy by 11.7%.
Enhances cross-modal retrieval MRR by 15.9% and 28.6%.
Boosts zero-shot captioning BERT-F1 by 18.8%.
Abstract
As wearable and mobile devices become increasingly embedded in daily life, they offer a practical way to continuously sense human motion in the wild. But inertial signals are highly dependent on the sensing setup, including body location, mounting position, sensor orientation, device hardware, and sampling protocol. This setup dependence makes it difficult to learn motion representations that transfer across devices and datasets, and limits the broader use of wearable IMUs beyond closed-set recognition. We introduce AnyMo, a geometry-aware framework for setup-agnostic human motion modeling. AnyMo uses physics-grounded IMU simulation over dense body-surface placements to generate diverse and plausible synthetic signals, pre-trains a graph encoder from paired synthetic placement views and masked partial observations, tokenizes multi-position IMU into full-body motion tokens, and aligns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
