Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios
German Barquero, Johnny N\'u\~nez, Zhen Xu, Sergio Escalera, and Wei-Wei Tu, Isabelle Guyon, Cristina Palmero

TL;DR
This paper systematically compares spatio-temporal models for human motion and pose forecasting in face-to-face interactions, demonstrating that attention-based methods trained for short-term prediction excel in longer-term forecasts even with noisy data.
Contribution
It provides the first comprehensive comparison of state-of-the-art models for behavior forecasting in dyadic interactions using the UDIVA v0.5 dataset.
Findings
Attention-based models achieve state-of-the-art performance.
Short-term trained models outperform baselines for longer-term prediction.
Robustness to noisy annotations suggests potential for weakly-supervised learning.
Abstract
Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In this work, we present the first systematic comparison of state-of-the-art approaches for behavior forecasting. To do so, we leverage whole-body annotations (face, body, and hands) from the very recently released UDIVA v0.5, which features face-to-face dyadic interactions. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future (<400ms), we outperform the baselines even for a considerably longer-term future (up to 2s). We also show that this finding holds when highly noisy annotations are used, which opens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications
