Something-Else: Compositional Action Recognition with Spatial-Temporal   Interaction Networks

Joanna Materzynska; Tete Xiao; Roei Herzig; Huijuan Xu; Xiaolong Wang,; Trevor Darrell

arXiv:1912.09930·cs.CV·September 15, 2020

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang,, Trevor Darrell

PDF

1 Repo 1 Video

TL;DR

This paper introduces a novel model for compositional action recognition that explicitly reasons about object-agent spatial-temporal interactions, enabling better generalization to unseen object-action combinations.

Contribution

The paper proposes a new model that explicitly captures geometric relations in object-agent interactions and introduces a compositional recognition task with non-overlapping training and test verb-noun pairs.

Findings

01

Effective on compositional action recognition task

02

Improves generalization in few-shot settings

03

Utilizes dense object annotations for training

Abstract

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. We propose a novel model which can explicitly reason about the geometric relations between constituent objects and an agent performing an action. To train our model, we collect dense object box annotations on the Something-Something dataset. We propose a novel compositional action recognition task where the training combinations of verbs and nouns do not overlap with the test set. The novel aspects of our model are applicable to activities with prominent object interaction dynamics and to objects which can be tracked using state-of-the-art approaches; for activities without clearly defined spatial object-agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joaanna/something_else
pytorch

Videos

Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks· youtube

Taxonomy

MethodsTest