Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance
Mingfang Zhang, Ryo Yonetani, Yifei Huang, Liangyang Ouyang, Ruicong Liu, Yoichi Sato

TL;DR
This paper introduces EAIL, a framework that uses egocentric action cues from head-mounted IMU signals, combined with vision-language guidance, to improve inertial localization within 3D point clouds, addressing challenges of noise and diverse human motions.
Contribution
The paper proposes a novel multimodal learning approach that correlates IMU-based action cues with environmental features for improved localization and action recognition.
Findings
Outperforms state-of-the-art inertial localization methods.
Effectively correlates human actions with environmental structures.
Enables concurrent action recognition from IMU and visual data.
Abstract
This paper presents a novel inertial localization framework named Egocentric Action-aware Inertial Localization (EAIL), which leverages egocentric action cues from head-mounted IMU signals to localize the target individual within a 3D point cloud. Human inertial localization is challenging due to IMU sensor noise that causes trajectory drift over time. The diversity of human actions further complicates IMU signal processing by introducing various motion patterns. Nevertheless, we observe that some actions captured by the head-mounted IMU correlate with spatial environmental structures (e.g., bending down to look inside an oven, washing dishes next to a sink), thereby serving as spatial anchors to compensate for the localization drift. The proposed EAIL framework learns such correlations via hierarchical multi-modal alignment with vision-language guidance. By assuming that the 3D point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications
MethodsALIGN
