The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction
Alexandros Stergiou, Dima Damen

TL;DR
The paper introduces a Temporal Progressive attention model that captures action evolution in videos through multi-scale sampling, achieving state-of-the-art early action prediction performance across multiple datasets.
Contribution
It proposes a novel multi-scale attention architecture, TemPr, for early action prediction, demonstrating superior performance and robustness over existing methods.
Findings
State-of-the-art accuracy on four video datasets
Effective multi-scale attention capturing action evolution
Robust performance across various encoder architectures
Abstract
Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video. We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales. Our proposed Temporal Progressive (TemPr) model is composed of multiple attention towers, one for each scale. The predicted action label is based on the collective agreement considering confidences of these towers. Extensive experiments over four video datasets showcase state-of-the-art performance on the task of Early Action Prediction across a range of encoder architectures. We demonstrate the effectiveness and consistency of TemPr through detailed ablations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
