Rethinking Learning Approaches for Long-Term Action Anticipation

Megha Nawhal; Akash Abdu Jyothi; Greg Mori

arXiv:2210.11566·cs.CV·October 24, 2022

Rethinking Learning Approaches for Long-Term Action Anticipation

Megha Nawhal, Akash Abdu Jyothi, Greg Mori

PDF

Open Access 1 Repo

TL;DR

This paper introduces ANTICIPATR, a transformer-based model that improves long-term action anticipation by combining segment-level and video-level representations, leading to better future action predictions.

Contribution

It proposes a novel two-stage training method and a model that leverages both segment and video representations for enhanced long-term action anticipation.

Findings

01

Outperforms existing methods on multiple datasets.

02

Effectively predicts future actions over various anticipation durations.

03

Demonstrates the benefit of segment-level representations in anticipation tasks.

Abstract

Action anticipation involves predicting future actions having observed the initial portion of a video. Typically, the observed video is processed as a whole to obtain a video-level representation of the ongoing activity in the video, which is then used for future prediction. We introduce ANTICIPATR which performs long-term action anticipation leveraging segment-level representations learned using individual segments from different activities, in addition to a video-level representation. We propose a two-stage learning approach to train a novel transformer-based model that uses these two types of representations to directly predict a set of future action instances over any given anticipation duration. Results on Breakfast, 50Salads, Epic-Kitchens-55, and EGTEA Gaze+ datasets demonstrate the effectiveness of our approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nmegha2601/anticipatr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications