Bidirectional Progressive Transformer for Interaction Intention Anticipation
Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang

TL;DR
This paper introduces a Bidirectional Progressive Transformer that jointly predicts hand trajectories and interaction hotspots, leveraging their inherent connection to improve accuracy and reduce error accumulation in interaction intention anticipation.
Contribution
The paper proposes a novel Bidirectional Progressive Transformer with mutual correction mechanisms and uncertainty modeling for improved interaction intention prediction.
Findings
Achieves state-of-the-art results on Epic-Kitchens-100, EGO4D, and EGTEA Gaze+ datasets.
Effectively reduces prediction errors over time through mutual enhancement.
Handles inherent randomness in human behavior with stochastic modeling.
Abstract
Interaction intention anticipation aims to jointly predict future hand trajectories and interaction hotspots. Existing research often treated trajectory forecasting and interaction hotspots prediction as separate tasks or solely considered the impact of trajectories on interaction hotspots, which led to the accumulation of prediction errors over time. However, a deeper inherent connection exists between hand trajectories and interaction hotspots, which allows for continuous mutual correction between them. Building upon this relationship, a novel Bidirectional prOgressive Transformer (BOT), which introduces a Bidirectional Progressive mechanism into the anticipation of interaction intention is established. Initially, BOT maximizes the utilization of spatial information from the last observation frame through the Spatial-Temporal Reconstruction Module, mitigating conflicts arising from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
MethodsAttention Is All You Need · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer · Byte Pair Encoding · Adam
