Interaction Transformer for Human Reaction Generation

Baptiste Chopin; Hao Tang; Naima Otberdout; Mohamed Daoudi; Nicu Sebe

arXiv:2207.01685·cs.CV·February 3, 2023

Interaction Transformer for Human Reaction Generation

Baptiste Chopin, Hao Tang, Naima Otberdout, Mohamed Daoudi, Nicu Sebe

PDF

Open Access 1 Repo

TL;DR

This paper introduces InterFormer, a Transformer-based model with spatial and temporal attention mechanisms, designed to generate human reactions from actions, outperforming existing methods on multiple datasets.

Contribution

The paper presents a novel interaction Transformer with graph-enhanced spatial attention and temporal modeling for reaction generation, enabling complex and long-term interaction synthesis.

Findings

01

InterFormer outperforms baselines on SBU, K3HI, and DuetDance datasets.

02

The model effectively captures temporal dependencies in motion.

03

Graph-based spatial attention improves focus on relevant body joints.

Abstract

We address the challenging task of human reaction generation, which aims to generate a corresponding reaction based on an input action. Most of the existing works do not focus on generating and predicting the reaction and cannot generate the motion when only the action is given as input. To address this limitation, we propose a novel interaction Transformer (InterFormer) consisting of a Transformer network with both temporal and spatial attention. Specifically, temporal attention captures the temporal dependencies of the motion of both characters and of their interaction, while spatial attention learns the dependencies between the different body parts of each character and those which are part of the interaction. Moreover, we propose using graphs to increase the performance of spatial attention via an interaction distance module that helps focus on nearby joints from both characters.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cristal-3dsam/interformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Time Series Analysis and Forecasting

MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing