SPOTR: Spatio-temporal Pose Transformers for Human Motion Prediction
Avinash Ajit Nargund, Misha Sra

TL;DR
This paper introduces SPOTR, a non-autoregressive Transformer model that predicts 3D human motion in parallel, offering faster inference and comparable accuracy to state-of-the-art methods across multiple datasets.
Contribution
The paper proposes a novel non-autoregressive Transformer architecture for human motion prediction, leveraging spatio-temporal self-attention to improve speed and activity-agnostic performance.
Findings
Achieves better or comparable results to state-of-the-art methods.
Fewer parameters and faster inference.
Activity-agnostic and parallel prediction capability.
Abstract
3D human motion prediction is a research area of high significance and a challenge in computer vision. It is useful for the design of many applications including robotics and autonomous driving. Traditionally, autogregressive models have been used to predict human motion. However, these models have high computation needs and error accumulation that make it difficult to use them for realtime applications. In this paper, we present a non-autogressive model for human motion prediction. We focus on learning spatio-temporal representations non-autoregressively for generation of plausible future motions. We propose a novel architecture that leverages the recently proposed Transformers. Human motion involves complex spatio-temporal dynamics with joints affecting the position and rotation of each other even though they are not connected directly. The proposed model extracts these dynamics using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Human Motion and Animation
MethodsMulti-Head Attention · Attention Is All You Need · Residual Connection · Dense Connections · Absolute Position Encodings · Linear Layer · Label Smoothing · Convolution · Dropout · Adam
