AFF-ttention! Affordances and Attention models for Short-Term Object   Interaction Anticipation

Lorenzo Mur-Labadia; Ruben Martinez-Cantin; Josechu Guerrero; Giovanni; Maria Farinella; Antonino Furnari

arXiv:2406.01194·cs.CV·June 6, 2024

AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation

Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Josechu Guerrero, Giovanni, Maria Farinella, Antonino Furnari

PDF

Open Access 1 Repo

TL;DR

This paper introduces STAformer, an attention-based model with affordance grounding modules, significantly improving short-term object interaction anticipation in egocentric videos for better human-robot interaction.

Contribution

The paper presents a novel attention-based architecture and two modules for modeling affordances, enhancing the accuracy of short-term interaction predictions from image-video pairs.

Findings

01

Up to +45% improvement in Top-5 mAP on Ego4D

02

Up to +42% improvement on curated EPIC-Kitchens dataset

03

Effective grounding of predictions using affordance modeling

Abstract

Short-Term object-interaction Anticipation consists of detecting the location of the next-active objects, the noun and verb categories of the interaction, and the time to contact from the observation of egocentric video. This ability is fundamental for wearable assistants or human robot interaction to understand the user goals, but there is still room for improvement to perform STA in a precise and reliable way. In this work, we improve the performance of STA predictions with two contributions: 1. We propose STAformer, a novel attention-based architecture integrating frame guided temporal pooling, dual image-video attention, and multiscale feature fusion to support STA predictions from an image-input video pair. 2. We introduce two novel modules to ground STA predictions on human behavior by modeling affordances.First, we integrate an environment affordance model which acts as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lmur98/AFFttention
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial Robot Interaction and HRI · Reinforcement Learning in Robotics · Human Pose and Action Recognition

MethodsSparse Evolutionary Training