Attentive Action and Context Factorization

Yang Wang; Vinh Tran; Gedas Bertasius; Lorenzo Torresani; Minh Hoai

arXiv:1904.05410·cs.CV·April 12, 2019·5 cites

Attentive Action and Context Factorization

Yang Wang, Vinh Tran, Gedas Bertasius, Lorenzo Torresani, Minh Hoai

PDF

Open Access

TL;DR

This paper introduces an attention-based method for human action recognition that localizes actions and context in videos without detailed annotations, improving accuracy and interpretability.

Contribution

A novel weakly supervised attentional mechanism that separates human actions from context in videos for recognition tasks.

Findings

01

Improved action recognition accuracy on multiple datasets.

02

Enhanced interpretability through localization of actions and context.

03

Effective separation of action and context factors without detailed annotations.

Abstract

We propose a method for human action recognition, one that can localize the spatiotemporal regions that `define' the actions. This is a challenging task due to the subtlety of human actions in video and the co-occurrence of contextual elements. To address this challenge, we utilize conjugate samples of human actions, which are video clips that are contextually similar to human action samples but do not contain the action. We introduce a novel attentional mechanism that can spatially and temporally separate human actions from the co-occurring contextual factors. The separation of the action and context factors is weakly supervised, eliminating the need for laboriously detailed annotation of these two factors in training samples. Our method can be used to build human action classifiers with higher accuracy and better interpretability. Experiments on several human action recognition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications