Interaction Transformer for Human Reaction Generation
Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, Nicu Sebe

TL;DR
This paper introduces InterFormer, a Transformer-based model with spatial and temporal attention mechanisms, designed to generate human reactions from actions, outperforming existing methods on multiple datasets.
Contribution
The paper presents a novel interaction Transformer with graph-enhanced spatial attention and temporal modeling for reaction generation, enabling complex and long-term interaction synthesis.
Findings
InterFormer outperforms baselines on SBU, K3HI, and DuetDance datasets.
The model effectively captures temporal dependencies in motion.
Graph-based spatial attention improves focus on relevant body joints.
Abstract
We address the challenging task of human reaction generation, which aims to generate a corresponding reaction based on an input action. Most of the existing works do not focus on generating and predicting the reaction and cannot generate the motion when only the action is given as input. To address this limitation, we propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attention. Specifically, temporal attention captures the temporal dependencies of the motion of both characters and of their interaction, while spatial attention learns the dependencies between the different body parts of each character and those which are part of the interaction. Moreover, we propose using graphs to increase the performance of spatial attention via an interaction distance module that helps focus on nearby joints from both characters.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Time Series Analysis and Forecasting
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing
