Multi-Person 3D Motion Prediction with Multi-Range Transformers

Jiashun Wang; Huazhe Xu; Medhini Narasimhan; Xiaolong Wang

arXiv:2111.12073·cs.CV·November 24, 2021·35 cites

Multi-Person 3D Motion Prediction with Multi-Range Transformers

Jiashun Wang, Huazhe Xu, Medhini Narasimhan, Xiaolong Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a Multi-Range Transformers framework for multi-person 3D motion prediction, capturing individual actions and social interactions, and can predict multiple persons' motions simultaneously with high accuracy.

Contribution

The novel Multi-Range Transformers model effectively integrates local and global social cues for multi-person 3D motion prediction, enabling simultaneous prediction of many individuals.

Findings

01

Outperforms state-of-the-art long-term 3D motion prediction methods.

02

Generates diverse social interaction scenarios.

03

Predicts motions for up to 15 persons simultaneously.

Abstract

We propose a novel framework for multi-person 3D motion trajectory prediction. Our key observation is that a human's action and behaviors may highly depend on the other persons around. Thus, instead of predicting each human pose trajectory in isolation, we introduce a Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-range encoder for social interactions. The Transformer decoder then performs prediction for each person by taking a corresponding pose as a query which attends to both local and global-range encoder features. Our model not only outperforms state-of-the-art methods on long-term 3D motion prediction, but also generates diverse social interactions. More interestingly, our model can even predict 15-person motion simultaneously by automatically dividing the persons into different interaction groups. Project page with code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiashunwang/MRT
pytorchOfficial

Videos

Multi-Person 3D Motion Prediction with Multi-Range Transformers· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections · Softmax · Residual Connection · Layer Normalization · Adam