SPOTR: Spatio-temporal Pose Transformers for Human Motion Prediction

Avinash Ajit Nargund; Misha Sra

arXiv:2303.06277·cs.CV·March 14, 2023·1 cites

SPOTR: Spatio-temporal Pose Transformers for Human Motion Prediction

Avinash Ajit Nargund, Misha Sra

PDF

Open Access

TL;DR

This paper introduces SPOTR, a non-autoregressive Transformer model that predicts 3D human motion in parallel, offering faster inference and comparable accuracy to state-of-the-art methods across multiple datasets.

Contribution

The paper proposes a novel non-autoregressive Transformer architecture for human motion prediction, leveraging spatio-temporal self-attention to improve speed and activity-agnostic performance.

Findings

01

Achieves better or comparable results to state-of-the-art methods.

02

Fewer parameters and faster inference.

03

Activity-agnostic and parallel prediction capability.

Abstract

3D human motion prediction is a research area of high significance and a challenge in computer vision. It is useful for the design of many applications including robotics and autonomous driving. Traditionally, autogregressive models have been used to predict human motion. However, these models have high computation needs and error accumulation that make it difficult to use them for realtime applications. In this paper, we present a non-autogressive model for human motion prediction. We focus on learning spatio-temporal representations non-autoregressively for generation of plausible future motions. We propose a novel architecture that leverages the recently proposed Transformers. Human motion involves complex spatio-temporal dynamics with joints affecting the position and rotation of each other even though they are not connected directly. The proposed model extracts these dynamics using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Human Motion and Animation

MethodsMulti-Head Attention · Attention Is All You Need · Residual Connection · Dense Connections · Absolute Position Encodings · Linear Layer · Label Smoothing · Convolution · Dropout · Adam