Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance

Mingfang Zhang; Ryo Yonetani; Yifei Huang; Liangyang Ouyang; Ruicong Liu; Yoichi Sato

arXiv:2505.14346·cs.CV·July 29, 2025

Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance

Mingfang Zhang, Ryo Yonetani, Yifei Huang, Liangyang Ouyang, Ruicong Liu, Yoichi Sato

PDF

Open Access 1 Repo

TL;DR

This paper introduces EAIL, a framework that uses egocentric action cues from head-mounted IMU signals, combined with vision-language guidance, to improve inertial localization within 3D point clouds, addressing challenges of noise and diverse human motions.

Contribution

The paper proposes a novel multimodal learning approach that correlates IMU-based action cues with environmental features for improved localization and action recognition.

Findings

01

Outperforms state-of-the-art inertial localization methods.

02

Effectively correlates human actions with environmental structures.

03

Enables concurrent action recognition from IMU and visual data.

Abstract

This paper presents a novel inertial localization framework named Egocentric Action-aware Inertial Localization (EAIL), which leverages egocentric action cues from head-mounted IMU signals to localize the target individual within a 3D point cloud. Human inertial localization is challenging due to IMU sensor noise that causes trajectory drift over time. The diversity of human actions further complicates IMU signal processing by introducing various motion patterns. Nevertheless, we observe that some actions captured by the head-mounted IMU correlate with spatial environmental structures (e.g., bending down to look inside an oven, washing dishes next to a sink), thereby serving as spatial anchors to compensate for the localization drift. The proposed EAIL framework learns such correlations via hierarchical multi-modal alignment with vision-language guidance. By assuming that the 3D point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mf-zhang/ego-inertial-localization
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications

MethodsALIGN