MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty

Leo Bringer; Joey Wilson; Kira Barton; Maani Ghaffari

arXiv:2410.03860·cs.CV·June 3, 2025

MDMP: Multi-modal Diffusion for supervised Motion Predictions with uncertainty

Leo Bringer, Joey Wilson, Kira Barton, Maani Ghaffari

PDF

Open Access 1 Repo

TL;DR

This paper presents MDMP, a multi-modal diffusion model that combines skeletal data and text to produce accurate, long-term human motion predictions with quantifiable uncertainty, enhancing control and spatial awareness.

Contribution

The paper introduces a novel multi-modal diffusion framework that integrates skeletal and textual data for improved long-term motion prediction with uncertainty estimation.

Findings

01

Outperforms existing methods in long-term motion prediction accuracy.

02

Effectively captures multiple motion modes through diffusion modeling.

03

Provides uncertainty estimates that improve spatial awareness in human-robot interaction.

Abstract

This paper introduces a Multi-modal Diffusion model for Motion Prediction (MDMP) that integrates and synchronizes skeletal data and textual descriptions of actions to generate refined long-term motion predictions with quantifiable uncertainty. Existing methods for motion forecasting or motion generation rely solely on either prior motions or text prompts, facing limitations with precision or control, particularly over extended durations. The multi-modal nature of our approach enhances the contextual understanding of human motion, while our graph-based transformer framework effectively capture both spatial and temporal motion dynamics. As a result, our model consistently outperforms existing generative techniques in accurately predicting long-term motions. Additionally, by leveraging diffusion models' ability to capture different modes of prediction, we estimate uncertainty,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leob03/mdmp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Medical Image Segmentation Techniques

MethodsDiffusion