Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction
Zheng Yin, Chengjian Li, Xiangbo Shu, Meiqi Cao, Rui Yan, Jinhui Tang

TL;DR
This paper introduces ST-MoE, a novel model that captures complex spatio-temporal dependencies in multi-person motion prediction more accurately and efficiently, reducing computational costs and outperforming existing methods.
Contribution
The paper proposes the Spatiotemporal-Untrammelled Mixture of Experts (ST-MoE), integrating diverse experts and bidirectional Mamba mechanisms to enhance modeling flexibility and efficiency in human motion prediction.
Findings
Outperforms state-of-the-art accuracy on four datasets.
Reduces model parameters by 41.38%.
Achieves 3.6x faster training speed.
Abstract
Comprehensively and flexibly capturing the complex spatio-temporal dependencies of human motion is critical for multi-person motion prediction. Existing methods grapple with two primary limitations: i) Inflexible spatiotemporal representation due to reliance on positional encodings for capturing spatiotemporal information. ii) High computational costs stemming from the quadratic time complexity of conventional attention mechanisms. To overcome these limitations, we propose the Spatiotemporal-Untrammelled Mixture of Experts (ST-MoE), which flexibly explores complex spatio-temporal dependencies in human motion and significantly reduces computational cost. To adaptively mine complex spatio-temporal patterns from human motion, our model incorporates four distinct types of spatiotemporal experts, each specializing in capturing different spatial or temporal dependencies. To reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Gait Recognition and Analysis
