When will you do what? - Anticipating Temporal Occurrences of Activities
Yazan Abu Farha, Alexander Richard, Juergen Gall

TL;DR
This paper introduces CNN and RNN-based methods for long-term prediction of future human actions and their durations in videos, addressing a gap in existing short-term focused approaches.
Contribution
It presents novel deep learning models capable of predicting extensive future actions and durations in videos, handling noisy inputs and large action diversity.
Findings
Accurate long-term action predictions demonstrated
Models handle noisy and erroneous input effectively
Effective over videos with many different actions
Abstract
Analyzing human actions in videos has gained increased attention recently. While most works focus on classifying and labeling observed video frames or anticipating the very recent future, making long-term predictions over more than just a few seconds is a task with many practical applications that has not yet been addressed. In this paper, we propose two methods to predict a considerably large amount of future actions and their durations. Both, a CNN and an RNN are trained to learn future video labels based on previously seen content. We show that our methods generate accurate predictions of the future even for long videos with a huge amount of different actions and can even deal with noisy or erroneous input information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
