Inductive Attention for Video Action Anticipation
Tsung-Ming Tai, Giuseppe Fiameni, Cheng-Kuang Lee, Simon See, Oswald, Lanz

TL;DR
This paper introduces IAM, an inductive attention model that uses current prediction priors as queries to better anticipate future actions in videos, outperforming existing models on large-scale datasets.
Contribution
The paper proposes a novel inductive attention mechanism that leverages prediction priors as queries, improving future action inference in video understanding tasks.
Findings
Outperforms state-of-the-art models on egocentric video datasets
Uses fewer parameters than existing methods
Effectively models uncertainty in future action prediction
Abstract
Anticipating future actions based on spatiotemporal observations is essential in video understanding and predictive computer vision. Moreover, a model capable of anticipating the future has important applications, it can benefit precautionary systems to react before an event occurs. However, unlike in the action recognition task, future information is inaccessible at observation time -- a model cannot directly map the video frames to the target action to solve the anticipation task. Instead, the temporal inference is required to associate the relevant evidence with possible future actions. Consequently, existing solutions based on the action recognition models are only suboptimal. Recently, researchers proposed extending the observation window to capture longer pre-action profiles from past moments and leveraging attention to retrieve the subtle evidence to improve the anticipation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Visual Attention and Saliency Detection · Advanced Neural Network Applications
