Massively Multi-Person 3D Human Motion Forecasting with Scene Context
Felix B Mueller, Julian Tanke, Juergen Gall

TL;DR
This paper introduces SAST, a scene-aware social transformer model that effectively forecasts long-term multi-person 3D human motion by incorporating scene context and interactions, outperforming previous methods on realism and diversity.
Contribution
The paper presents a novel scene-aware social transformer model that models interactions among varying numbers of people and objects for long-term 3D human motion prediction.
Findings
Outperforms previous models in realism and diversity metrics
Effectively models interactions among multiple people and objects
Achieves superior results on the Humans in Kitchens dataset
Abstract
Forecasting long-term 3D human motion is challenging: the stochasticity of human behavior makes it hard to generate realistic human motion from the input sequence alone. Information on the scene environment and the motion of nearby people can greatly aid the generation process. We propose a scene-aware social transformer model (SAST) to forecast long-term (10s) human motion motion. Unlike previous models, our approach can model interactions between both widely varying numbers of people and objects in a scene. We combine a temporal convolutional encoder-decoder architecture with a Transformer-based bottleneck that allows us to efficiently combine motion and scene information. We model the conditional motion distribution using denoising diffusion models. We benchmark our approach on the Humans in Kitchens dataset, which contains 1 to 16 persons and 29 to 50 objects that are visible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Gait Recognition and Analysis
MethodsDiffusion
