Quantifying and Learning Static vs. Dynamic Information in Deep   Spatiotemporal Networks

Matthew Kowal; Mennatullah Siam; Md Amirul Islam; Neil D. B. Bruce,; Richard P. Wildes; Konstantinos G. Derpanis

arXiv:2211.01783·cs.CV·September 17, 2024

Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks

Matthew Kowal, Mennatullah Siam, Md Amirul Islam, Neil D. B. Bruce,, Richard P. Wildes, Konstantinos G. Derpanis

PDF

Open Access

TL;DR

This paper introduces a method to quantify static versus dynamic information biases in deep spatiotemporal models, revealing prevalent static bias and proposing techniques to mitigate it for improved video analysis.

Contribution

It presents a novel approach for measuring static and dynamic biases in models and applies it across multiple video tasks, offering insights and debiasing strategies.

Findings

01

Most models are biased toward static information.

02

Some datasets are biased toward static rather than dynamic cues.

03

Individual channels can be selectively biased toward static or dynamic information.

Abstract

There is limited understanding of the information captured by deep spatiotemporal models in their intermediate representations. For example, while evidence suggests that action recognition algorithms are heavily influenced by visual appearance in single frames, no quantitative methodology exists for evaluating such static bias in the latent representation compared to bias toward dynamics. We tackle this challenge by proposing an approach for quantifying the static and dynamic biases of any spatiotemporal model, and apply our approach to three tasks, action recognition, automatic video object segmentation (AVOS) and video instance segmentation (VIS). Our key findings are: (i) Most examined models are biased toward static information. (ii) Some datasets that are assumed to be biased toward dynamics are actually biased toward static information. (iii) Individual channels in an architecture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications

MethodsDropout