Comparison of Spatio-Temporal Models for Human Motion and Pose   Forecasting in Face-to-Face Interaction Scenarios

German Barquero; Johnny N\'u\~nez; Zhen Xu; Sergio Escalera; and Wei-Wei Tu; Isabelle Guyon; Cristina Palmero

arXiv:2203.03245·cs.CV·March 8, 2022

Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios

German Barquero, Johnny N\'u\~nez, Zhen Xu, Sergio Escalera, and Wei-Wei Tu, Isabelle Guyon, Cristina Palmero

PDF

Open Access 1 Repo

TL;DR

This paper systematically compares spatio-temporal models for human motion and pose forecasting in face-to-face interactions, demonstrating that attention-based methods trained for short-term prediction excel in longer-term forecasts even with noisy data.

Contribution

It provides the first comprehensive comparison of state-of-the-art models for behavior forecasting in dyadic interactions using the UDIVA v0.5 dataset.

Findings

01

Attention-based models achieve state-of-the-art performance.

02

Short-term trained models outperform baselines for longer-term prediction.

03

Robustness to noisy annotations suggests potential for weakly-supervised learning.

Abstract

Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In this work, we present the first systematic comparison of state-of-the-art approaches for behavior forecasting. To do so, we leverage whole-body annotations (face, body, and hands) from the very recently released UDIVA v0.5, which features face-to-face dyadic interactions. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future (<400ms), we outperform the baselines even for a considerably longer-term future (up to 2s). We also show that this finding holds when highly noisy annotations are used, which opens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

crisie/udiva
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications