Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation
Shuying Liu, Wenbin Wu, Jiaxian Wu, Yue Lin

TL;DR
This paper introduces a novel Spatial-Temporal Parallel Transformer that estimates arm and hand dynamics from monocular videos by leveraging their correlation, achieving more accurate and smooth motion predictions.
Contribution
The paper presents a new PAHMT model that simultaneously predicts arm and hand movements, incorporating novel loss functions and a large motion capture dataset for training.
Findings
Outperforms previous state-of-the-art methods
Produces smooth and plausible arm-hand motion estimations
Demonstrates robustness in challenging scenarios
Abstract
We propose an approach to estimate arm and hand dynamics from monocular video by utilizing the relationship between arm and hand. Although monocular full human motion capture technologies have made great progress in recent years, recovering accurate and plausible arm twists and hand gestures from in-the-wild videos still remains a challenge. To solve this problem, our solution is proposed based on the fact that arm poses and hand gestures are highly correlated in most real situations. To fully exploit arm-hand correlation as well as inter-frame information, we carefully design a Spatial-Temporal Parallel Arm-Hand Motion Transformer (PAHMT) to predict the arm and hand dynamics simultaneously. We also introduce new losses to encourage the estimations to be smooth and accurate. Besides, we collect a motion capture dataset including 200K frames of hand gestures and use this data to train…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Human Motion and Animation
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Softmax · Absolute Position Encodings · Layer Normalization · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections
