Learning Reactive Human Motion Generation from Paired Interaction Data Using Transformer-Based Models

Masato Soga; Ryuki Takebayashi

arXiv:2604.22164·cs.CV·April 27, 2026

Learning Reactive Human Motion Generation from Paired Interaction Data Using Transformer-Based Models

Masato Soga, Ryuki Takebayashi

PDF

TL;DR

This paper explores generating one person's motion based on another's in interactions using Transformer models, introducing person ID embeddings to improve motion consistency and prevent structural collapse.

Contribution

It demonstrates the effectiveness of Transformer-based models with person ID embeddings for interaction-aware human motion generation from paired data.

Findings

01

Simple Transformer generates plausible interaction motions without posture collapse.

02

Person ID embedding improves motion consistency and structural stability.

03

iTransformer and Crossformer models tend to accumulate errors over time.

Abstract

Recent advances in deep learning have enabled the generation of videos from textual descriptions as well as the prediction of future sequences from input videos. Similarly, in human motion modeling, motions can be generated from text or predicted from a single person's motion sequence. However, these approaches primarily focus on single-agent motion generation. In contrast, this study addresses the problem of generating the motion of one person based on the motion of another in interaction scenarios, where the two motions are mutually dependent. We construct a dataset of paired action-reaction motion sequences extracted from boxing match videos and investigate the effectiveness of Transformer-based models for this task. Specifically, we implement and compare three models: a simple Transformer, iTransformer, and Crossformer. In addition, we introduce a person ID embedding to explicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.