SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric   Videos

Nada Osman; Guglielmo Camporese; Pasquale Coscia; Lamberto Ballan

arXiv:2109.00829·cs.CV·September 3, 2021

SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

Nada Osman, Guglielmo Camporese, Pasquale Coscia, Lamberto Ballan

PDF

Open Access

TL;DR

This paper introduces a novel attention-based method that enhances action anticipation in egocentric videos by simultaneously evaluating slow and fast features across multiple modalities, improving prediction accuracy.

Contribution

It extends the RULSTM architecture with a new attention mechanism and multi-scale processing, leading to better anticipation performance in egocentric video datasets.

Findings

01

Improved Top-5 accuracy on EpicKitchens-55 dataset

02

Enhanced prediction at various anticipation times

03

Effective multi-modal, multi-scale feature evaluation

Abstract

Action anticipation in egocentric videos is a difficult task due to the inherently multi-modal nature of human actions. Additionally, some actions happen faster or slower than others depending on the actor or surrounding context which could vary each time and lead to different predictions. Based on this idea, we build upon RULSTM architecture, which is specifically designed for anticipating human actions, and propose a novel attention-based technique to evaluate, simultaneously, slow and fast features extracted from three different modalities, namely RGB, optical flow, and extracted objects. Two branches process information at different time scales, i.e., frame-rates, and several fusion schemes are considered to improve prediction accuracy. We perform extensive experiments on EpicKitchens-55 and EGTEA Gaze+ datasets, and demonstrate that our technique systematically improves the results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Video Surveillance and Tracking Methods