Guided Attention for Next Active Object @ EGO4D STA Challenge

Sanket Thakur; Cigdem Beyan; Pietro Morerio; Vittorio Murino; Alessio; Del Bue

arXiv:2305.16066·cs.CV·October 5, 2023·1 cites

Guided Attention for Next Active Object @ EGO4D STA Challenge

Sanket Thakur, Cigdem Beyan, Pietro Morerio, Vittorio Murino, Alessio, Del Bue

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper presents a Guided-Attention based model for short-term object anticipation in egocentric videos, improving performance and achieving state-of-the-art results in the EGO4D challenge.

Contribution

It introduces a novel Guided-Attention mechanism integrated with StillFast for enhanced spatiotemporal feature extraction in egocentric video anticipation.

Findings

01

Achieved state-of-the-art results on EGO4D test set.

02

Improved validation performance over baseline models.

03

Enhanced motion and contextual understanding in video anticipation.

Abstract

In this technical report, we describe the Guided-Attention mechanism based solution for the short-term anticipation (STA) challenge for the EGO4D challenge. It combines the object detections, and the spatiotemporal features extracted from video clips, enhancing the motion and contextual information, and further decoding the object-centric and motion-centric information to address the problem of STA in egocentric videos. For the challenge, we build our model on top of StillFast with Guided Attention applied on fast network. Our model obtains better performance on the validation set and also achieves state-of-the-art (SOTA) results on the challenge test set for EGO4D Short-Term Object Interaction Anticipation Challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sanketsans/ganov2
pytorchOfficial

Models

🤗
sanketsans/ganov2
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods

MethodsTest