Weakly-Supervised Action Localization and Action Recognition using   Global-Local Attention of 3D CNN

Novanto Yudistira; Muthu Subash Kavitha; Takio Kurita

arXiv:2012.09542·cs.CV·August 17, 2022

Weakly-Supervised Action Localization and Action Recognition using Global-Local Attention of 3D CNN

Novanto Yudistira, Muthu Subash Kavitha, Takio Kurita

PDF

TL;DR

This paper introduces a weakly-supervised method for action localization and recognition in videos using global-local attention mechanisms in 3D CNNs, improving interpretability and accuracy.

Contribution

It proposes a novel global-local gradient aggregation and attention gating approach for enhanced visual explanations and action recognition in 3D CNNs.

Findings

01

Improved visual attribution and localization accuracy.

02

Enhanced action recognition performance over baseline.

03

Effective use of layer-wise attention for video analysis.

Abstract

3D Convolutional Neural Network (3D CNN) captures spatial and temporal information on 3D data such as video sequences. However, due to the convolution and pooling mechanism, the information loss seems unavoidable. To improve the visual explanations and classification in 3D CNN, we propose two approaches; i) aggregate layer-wise global to local (global-local) discrete gradients using trained 3DResNext network, and ii) implement attention gating network to improve the accuracy of the action recognition. The proposed approach intends to show the usefulness of every layer termed as global-local attention in 3D CNN via visual attribution, weakly-supervised action localization, and action recognition. Firstly, the 3DResNext is trained and applied for action classification using backpropagation concerning the maximum predicted class. The gradients and activations of every layer are then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods3 Dimensional Convolutional Neural Network · Global-Local Attention · Convolution