MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos

Junyi Ma; Xieyuanli Chen; Wentao Bao; Jingyi Xu; Hesheng Wang

arXiv:2409.02638·cs.CV·November 17, 2025

MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos

Junyi Ma, Xieyuanli Chen, Wentao Bao, Jingyi Xu, Hesheng Wang

PDF

Open Access

TL;DR

MADiff is a novel diffusion-based model that predicts hand trajectories in egocentric videos by integrating egomotion and high-level semantics, improving accuracy and real-time performance for applications in robotics and extended reality.

Contribution

The paper introduces MADiff, a motion-aware diffusion model that incorporates egomotion and semantic features for improved hand trajectory prediction without explicit affordance labels.

Findings

01

Achieves comparable accuracy to state-of-the-art methods.

02

Operates in real-time for practical applications.

03

Demonstrates effectiveness across five public datasets.

Abstract

Understanding human intentions and actions through egocentric videos is important on the path to embodied artificial intelligence. As a branch of egocentric vision techniques, hand trajectory prediction plays a vital role in comprehending human motion patterns, benefiting downstream tasks in extended reality and robot manipulation. However, capturing high-level human intentions consistent with reasonable temporal causality is challenging when only egocentric videos are available. This difficulty is exacerbated under camera egomotion interference and the absence of affordance labels to explicitly guide the optimization of hand waypoint distribution. In this work, we propose a novel hand trajectory prediction method dubbed MADiff, which forecasts future hand waypoints with diffusion models. The devised denoising operation in the latent space is achieved by our proposed motion-aware Mamba,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Stroke Rehabilitation and Recovery

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Diffusion