TAMFormer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction
Nada Osman, Guglielmo Camporese, Lamberto Ballan

TL;DR
TAMFormer introduces a multi-modal transformer with learned attention masks to improve early pedestrian intention prediction from urban scene videos, demonstrating enhanced performance across multiple benchmarks.
Contribution
The paper proposes a novel transformer model with adaptive attention masks for early intention prediction, advancing the state-of-the-art in multi-modal video analysis.
Findings
Improves prediction accuracy at various anticipation times.
Effectively encodes past observations for future activity prediction.
Outperforms previous methods on public benchmarks.
Abstract
Human intention prediction is a growing area of research where an activity in a video has to be anticipated by a vision-based system. To this end, the model creates a representation of the past, and subsequently, it produces future hypotheses about upcoming scenarios. In this work, we focus on pedestrians' early intention prediction in which, from a current observation of an urban scene, the model predicts the future activity of pedestrians that approach the street. Our method is based on a multi-modal transformer that encodes past observations and produces multiple predictions at different anticipation times. Moreover, we propose to learn the attention masks of our transformer-based model (Temporal Adaptive Mask Transformer) in order to weigh differently present and past temporal dependencies. We investigate our method on several public benchmarks for early intention prediction,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Image Enhancement Techniques
