Learning Context-Adaptive Motion Priors for Masked Motion Diffusion Models with Efficient Kinematic Attention Aggregation

Junkun Jiang; Jie Chen; Ho Yin Au; Jingyu Xiang

arXiv:2603.07697·cs.CV·March 10, 2026

Learning Context-Adaptive Motion Priors for Masked Motion Diffusion Models with Efficient Kinematic Attention Aggregation

Junkun Jiang, Jie Chen, Ho Yin Au, Jingyu Xiang

PDF

Open Access

TL;DR

This paper introduces MMDM, a diffusion-based framework with Kinematic Attention Aggregation for adaptive, efficient 3D motion reconstruction from incomplete data, outperforming existing methods across various tasks.

Contribution

The paper proposes a novel Masked Motion Diffusion Model with Kinematic Attention Aggregation that learns context-adaptive motion priors for diverse motion reconstruction tasks.

Findings

01

Achieves strong performance on public benchmarks.

02

Effectively handles various masking strategies.

03

Versatile across multiple motion reconstruction tasks.

Abstract

Vision-based motion capture solutions often struggle with occlusions, which result in the loss of critical joint information and hinder accurate 3D motion reconstruction. Other wearable alternatives also suffer from noisy or unstable data, often requiring extensive manual cleaning and correction to achieve reliable results. To address these challenges, we introduce the Masked Motion Diffusion Model (MMDM), a diffusion-based generative reconstruction framework that enhances incomplete or low-confidence motion data using partially available high-quality reconstructions within a Masked Autoencoder architecture. Central to our design is the Kinematic Attention Aggregation (KAA) mechanism, which enables efficient, deep, and iterative encoding of both joint-level and pose-level features, capturing structural and temporal motion patterns essential for task-specific reconstruction. We focus on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Human Motion and Animation