Modeling long-term interactions to enhance action recognition

Alejandro Cartas; Petia Radeva; Mariella Dimiccoli

arXiv:2104.11520·cs.CV·April 26, 2021

Modeling long-term interactions to enhance action recognition

Alejandro Cartas, Petia Radeva, Mariella Dimiccoli

PDF

TL;DR

This paper introduces a hierarchical LSTM-based method that leverages object interaction semantics at frame and temporal levels to improve egocentric action recognition, outperforming existing methods without using motion cues.

Contribution

It presents a novel hierarchical LSTM architecture combined with a region-based CNN for enhanced understanding of actions in egocentric videos, emphasizing object interaction semantics.

Findings

01

Outperforms state-of-the-art on standard benchmarks

02

Both frame-level and temporal-level HLSTM contribute to accuracy

03

Does not rely on motion information for recognition

Abstract

In this paper, we propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels. At the frame level, we use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects and calculates the action score through a CNN formulation. This information is then fed to a Hierarchical LongShort-Term Memory Network (HLSTM) that captures temporal dependencies between actions within and across shots. Ablation studies thoroughly validate the proposed approach, showing in particular that both levels of the HLSTM architecture contribute to performance improvement. Furthermore, quantitative comparisons show that the proposed approach outperforms the state-of-the-art in terms of action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMemory Network