Recur, Attend or Convolve? On Whether Temporal Modeling Matters for   Cross-Domain Robustness in Action Recognition

Sofia Broom\'e; Ernest Pokropek; Boyu Li; Hedvig Kjellstr\"om

arXiv:2112.12175·cs.CV·October 12, 2022

Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition

Sofia Broom\'e, Ernest Pokropek, Boyu Li, Hedvig Kjellstr\"om

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether temporal modeling choices, like recurrence, improve cross-domain robustness in action recognition, highlighting the importance of physical inductive biases over purely parameterized models.

Contribution

It introduces the Temporal Shape dataset and modified Diving48 domains to systematically assess the impact of temporal modeling on texture bias and robustness.

Findings

01

Recurrence may enhance domain shift robustness in action recognition.

02

Temporal shape cues are crucial for generalization across domains.

03

Physical inductive biases outperform texture biases in robustness.

Abstract

Most action recognition models today are highly parameterized, and evaluated on datasets with appearance-wise distinct classes. It has also been shown that 2D Convolutional Neural Networks (CNNs) tend to be biased toward texture rather than shape in still image recognition tasks, in contrast to humans. Taken together, this raises suspicion that large video models partly learn spurious spatial texture correlations rather than to track relevant shapes over time to infer generalizable semantics from their movement. A natural way to avoid parameter explosion when learning visual patterns over time is to make use of recurrence. Biological vision consists of abundant recurrent circuitry, and is superior to computer vision in terms of domain shift generalization. In this article, we empirically study whether the choice of low-level temporal modeling has consequences for texture bias and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sofiabroome/temporal-shape-dataset
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsSpatio-temporal stability analysis