Predicting the Next Action by Modeling the Abstract Goal

Debaditya Roy; Basura Fernando

arXiv:2209.05044·cs.CV·August 22, 2024·6 cites

Predicting the Next Action by Modeling the Abstract Goal

Debaditya Roy, Basura Fernando

PDF

Open Access

TL;DR

This paper introduces a novel action anticipation model that uses visual representations to infer an abstract goal, significantly improving prediction accuracy on challenging datasets by sampling multiple action candidates and ensuring goal consistency.

Contribution

The paper proposes a new approach leveraging abstract goal modeling with variational recurrent networks for improved human action anticipation without explicit goal information during inference.

Findings

01

Achieved +13.69% Top-1 verb accuracy on EK55 seen kitchens

02

Improved Top-1 noun accuracy by +13.1% on EGTEA Gaze+

03

Set new state-of-the-art results on EK55 and EGTEA Gaze+ datasets

Abstract

The problem of anticipating human actions is an inherently uncertain one. However, we can reduce this uncertainty if we have a sense of the goal that the actor is trying to achieve. Here, we present an action anticipation model that leverages goal information for the purpose of reducing the uncertainty in future predictions. Since we do not possess goal information or the observed actions during inference, we resort to visual representation to encapsulate information about both actions and goals. Through this, we derive a novel concept called abstract goal which is conditioned on observed sequences of visual features for action anticipation. We design the abstract goal as a distribution whose parameters are estimated using a variational recurrent network. We sample multiple candidates for the next action and introduce a goal consistency measure to determine the best candidate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection