Anticipating Next Active Objects for Egocentric Videos

Sanket Thakur; Cigdem Beyan; Pietro Morerio; Vittorio Murino and; Alessio Del Bue

arXiv:2302.06358·cs.CV·May 2, 2024

Anticipating Next Active Objects for Egocentric Videos

Sanket Thakur, Cigdem Beyan, Pietro Morerio, Vittorio Murino and, Alessio Del Bue

PDF

Open Access

TL;DR

This paper introduces ANACTO, a transformer-based framework for predicting the location of the next active object in egocentric videos before contact occurs, addressing a novel problem in first-person action anticipation.

Contribution

It proposes the first method to anticipate the next active object and its location in egocentric videos, using a self-attention transformer model and providing new annotations for multiple datasets.

Findings

01

Our method outperforms baseline approaches on three datasets.

02

The approach effectively predicts the next active object before contact.

03

Ablation studies confirm the importance of the proposed components.

Abstract

This paper addresses the problem of anticipating the next-active-object location in the future, for a given egocentric video clip where the contact might happen, before any action takes place. The problem is considerably hard, as we aim at estimating the position of such objects in a scenario where the observed clip and the action segment are separated by the so-called ``time to contact'' (TTC) segment. Many methods have been proposed to anticipate the action of a person based on previous hand movements and interactions with the surroundings. However, there have been no attempts to investigate the next possible interactable object, and its future location with respect to the first-person's motion and the field-of-view drift during the TTC window. We define this as the task of Anticipating the Next ACTive Object (ANACTO). To this end, we propose a transformer-based self-attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications

MethodsSoftmax · Linear Layer · Multi-Head Attention · Dense Connections · Attention Is All You Need · Residual Connection · Layer Normalization · Vision Transformer