Robust Human Motion Forecasting using Transformer-based Model
Esteve Valls Mascaro, Shuo Ma, Hyemin Ahn, Dongheui Lee

TL;DR
This paper introduces a lightweight, robust Transformer-based model for real-time 3D human motion forecasting that outperforms existing models in accuracy and efficiency, especially under occlusion and noisy conditions.
Contribution
The proposed 2-Channel Transformer (2CH-TR) is a novel model that effectively exploits spatio-temporal information for short and long-term human motion prediction, with improved robustness and speed.
Findings
Outperforms ST-Transformer in accuracy
Reduces mean squared error by 8.89% short-term
Operates efficiently in noisy, occluded environments
Abstract
Comprehending human motion is a fundamental challenge for developing Human-Robot Collaborative applications. Computer vision researchers have addressed this field by only focusing on reducing error in predictions, but not taking into account the requirements to facilitate its implementation in robots. In this paper, we propose a new model based on Transformer that simultaneously deals with the real time 3D human motion forecasting in the short and long term. Our 2-Channel Transformer (2CH-TR) is able to efficiently exploit the spatio-temporal information of a shortly observed sequence (400ms) and generates a competitive accuracy against the current state-of-the-art. 2CH-TR stands out for the efficient performance of the Transformer, being lighter and faster than its competitors. In addition, our model is tested in conditions where the human motion is severely occluded, demonstrating its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Linear Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer · Softmax
