Learning Reactive Human Motion Generation from Paired Interaction Data Using Transformer-Based Models
Masato Soga, Ryuki Takebayashi

TL;DR
This paper explores generating one person's motion based on another's in interactions using Transformer models, introducing person ID embeddings to improve motion consistency and prevent structural collapse.
Contribution
It demonstrates the effectiveness of Transformer-based models with person ID embeddings for interaction-aware human motion generation from paired data.
Findings
Simple Transformer generates plausible interaction motions without posture collapse.
Person ID embedding improves motion consistency and structural stability.
iTransformer and Crossformer models tend to accumulate errors over time.
Abstract
Recent advances in deep learning have enabled the generation of videos from textual descriptions as well as the prediction of future sequences from input videos. Similarly, in human motion modeling, motions can be generated from text or predicted from a single person's motion sequence. However, these approaches primarily focus on single-agent motion generation. In contrast, this study addresses the problem of generating the motion of one person based on the motion of another in interaction scenarios, where the two motions are mutually dependent. We construct a dataset of paired action-reaction motion sequences extracted from boxing match videos and investigate the effectiveness of Transformer-based models for this task. Specifically, we implement and compare three models: a simple Transformer, iTransformer, and Crossformer. In addition, we introduce a person ID embedding to explicitly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
