SMamDiff: Spatial Mamba for Stochastic Human Motion Prediction
Junqiao Fan, Pengfei Liu, Haocong Rao

TL;DR
SMamDiff introduces a novel single-stage diffusion model for human motion prediction that ensures spatial-temporal coherence, improving accuracy and efficiency over existing methods.
Contribution
The paper proposes SMamDiff, a spatial Mamba-based diffusion model with residual-DCT encoding and a joint-by-joint processing module for coherent, probabilistic human motion prediction.
Findings
Achieves state-of-the-art results on Human3.6M and HumanEva datasets.
Reduces latency and memory usage compared to multi-stage diffusion methods.
Ensures spatial-temporal coherence in single-stage probabilistic HMP.
Abstract
With intelligent room-side sensing and service robots widely deployed, human motion prediction (HMP) is essential for safe, proactive assistance. However, many existing HMP methods either produce a single, deterministic forecast that ignores uncertainty or rely on probabilistic models that sacrifice kinematic plausibility. Diffusion models improve the accuracy-diversity trade-off but often depend on multi-stage pipelines that are costly for edge deployment. This work focuses on how to ensure spatial-temporal coherence within a single-stage diffusion model for HMP. We introduce SMamDiff, a Spatial Mamba-based Diffusion model with two novel designs: (i) a residual-DCT motion encoding that subtracts the last observed pose before a temporal DCT, reducing the first DC component () dominance and highlighting informative higher-frequency cues so the model learns how joints move rather…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGait Recognition and Analysis · Human Pose and Action Recognition · 3D Shape Modeling and Analysis
