CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion
Jiarui Sun, Girish Chowdhary

TL;DR
CoMusion introduces a diffusion-based, end-to-end framework for stochastic human motion prediction that ensures realistic, diverse, and consistent future pose sequences by combining Transformer and GCN modules.
Contribution
The paper presents a novel single-stage diffusion model with Transformer-GCN architecture for improved stochastic human motion prediction, addressing prior methods' inconsistency and complexity.
Findings
Outperforms prior methods on benchmark datasets
Produces more realistic and diverse motion predictions
Maintains consistency with observed human motion
Abstract
Stochastic Human Motion Prediction (HMP) aims to predict multiple possible future human pose sequences from observed ones. Most prior works learn motion distributions through encoding-decoding in the latent space, which does not preserve motion's spatial-temporal structure. While effective, these methods often require complex, multi-stage training and yield predictions that are inconsistent with the provided history and can be physically unrealistic. To address these issues, we propose CoMusion, a single-stage, end-to-end diffusion-based stochastic HMP framework. CoMusion is inspired from the insight that a smooth future pose initialization improves prediction performance, a strategy not previously utilized in stochastic models but evidenced in deterministic works. To generate such initialization, CoMusion's motion predictor starts with a Transformer-based network for initial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Video Surveillance and Tracking Methods
