Rethinking Learning Approaches for Long-Term Action Anticipation
Megha Nawhal, Akash Abdu Jyothi, Greg Mori

TL;DR
This paper introduces ANTICIPATR, a transformer-based model that improves long-term action anticipation by combining segment-level and video-level representations, leading to better future action predictions.
Contribution
It proposes a novel two-stage training method and a model that leverages both segment and video representations for enhanced long-term action anticipation.
Findings
Outperforms existing methods on multiple datasets.
Effectively predicts future actions over various anticipation durations.
Demonstrates the benefit of segment-level representations in anticipation tasks.
Abstract
Action anticipation involves predicting future actions having observed the initial portion of a video. Typically, the observed video is processed as a whole to obtain a video-level representation of the ongoing activity in the video, which is then used for future prediction. We introduce ANTICIPATR which performs long-term action anticipation leveraging segment-level representations learned using individual segments from different activities, in addition to a video-level representation. We propose a two-stage learning approach to train a novel transformer-based model that uses these two types of representations to directly predict a set of future action instances over any given anticipation duration. Results on Breakfast, 50Salads, Epic-Kitchens-55, and EGTEA Gaze+ datasets demonstrate the effectiveness of our approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
