TransFusion: A Practical and Effective Transformer-based Diffusion Model for 3D Human Motion Prediction
Sibo Tian, Minghui Zheng, and Xiao Liang

TL;DR
TransFusion introduces a Transformer-based diffusion model for 3D human motion prediction that balances accuracy and diversity, leveraging frequency domain modeling and lightweight design for improved performance.
Contribution
The paper presents a novel diffusion model using Transformers and frequency domain techniques for more accurate and diverse 3D human motion prediction, with a simplified input conditioning approach.
Findings
Outperforms existing models on benchmark datasets
Generates more realistic and diverse human motion sequences
Maintains high prediction accuracy with a lightweight architecture
Abstract
Predicting human motion plays a crucial role in ensuring a safe and effective human-robot close collaboration in intelligent remanufacturing systems of the future. Existing works can be categorized into two groups: those focusing on accuracy, predicting a single future motion, and those generating diverse predictions based on observations. The former group fails to address the uncertainty and multi-modal nature of human motion, while the latter group often produces motion sequences that deviate too far from the ground truth or become unrealistic within historical contexts. To tackle these issues, we propose TransFusion, an innovative and practical diffusion-based model for 3D human motion prediction which can generate samples that are more likely to happen while maintaining a certain level of diversity. Our model leverages Transformer as the backbone with long skip connections between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical Imaging and Spectroscopy Techniques · Human Pose and Action Recognition
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Position-Wise Feed-Forward Layer · Linear Layer · Dense Connections · Label Smoothing · Dropout · Adam · Absolute Position Encodings
