THORN: Temporal Human-Object Relation Network for Action Recognition

Mohammed Guermal; Rui Dai; and Francois Bremond

arXiv:2204.09468·cs.CV·April 21, 2022

THORN: Temporal Human-Object Relation Network for Action Recognition

Mohammed Guermal, Rui Dai, and Francois Bremond

PDF

Open Access

TL;DR

THORN is an end-to-end neural network that models human-object and object-object interactions to improve action recognition, achieving state-of-the-art results on challenging first-person datasets.

Contribution

The paper introduces THORN, a novel model that leverages human-object interactions and object relations for enhanced action recognition.

Findings

01

Achieves state-of-the-art performance on EPIC-Kitchen55.

02

Effectively models object relations and interactions.

03

Robust across first-person human-object interaction datasets.

Abstract

Most action recognition models treat human activities as unitary events. However, human activities often follow a certain hierarchy. In fact, many human activities are compositional. Also, these actions are mostly human-object interactions. In this paper we propose to recognize human action by leveraging the set of interactions that define an action. In this work, we present an end-to-end network: THORN, that can leverage important human-object and object-object interactions to predict actions. This model is built on top of a 3D backbone network. The key components of our model are: 1) An object representation filter for modeling object. 2) An object relation reasoning module to capture object relations. 3) A classification layer to predict the action labels. To show the robustness of THORN, we evaluate it on EPIC-Kitchen55 and EGTEA Gaze+, two of the largest and most challenging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems