TL;DR
This paper introduces a novel spatial-temporal anchor-based sampling method for human motion prediction, improving diversity and accuracy over likelihood-based approaches by disentangling sampled codes with interpretable anchors.
Contribution
It proposes a unified framework using spatial-temporal anchors and an interaction-enhanced graph network to enhance human motion prediction, addressing mode collapse issues.
Findings
Outperforms state-of-the-art in stochastic and deterministic predictions
Provides interpretable control over spatial-temporal disparity
Demonstrates effectiveness across various motion prediction tasks
Abstract
Predicting diverse human motions given a sequence of historical poses has received increasing attention. Despite rapid progress, existing work captures the multi-modal nature of human motions primarily through likelihood-based sampling, where the mode collapse has been widely observed. In this paper, we propose a simple yet effective approach that disentangles randomly sampled codes with a deterministic learnable component named anchors to promote sample precision and diversity. Anchors are further factorized into spatial anchors and temporal anchors, which provide attractively interpretable control over spatial-temporal disparity. In principle, our spatial-temporal anchor-based sampling (STARS) can be applied to different motion predictors. Here we propose an interaction-enhanced spatial-temporal graph convolutional network (IE-STGCN) that encodes prior knowledge of human motions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
