Hierarchical Multi-scale Attention Networks for Action Recognition
Shiyang Yan, Jeremy S. Smith, Wenjin Lu, Bailing Zhang

TL;DR
This paper introduces a Hierarchical Multi-scale Attention Network (HM-AN) that combines hierarchical RNNs and attention mechanisms with Gumbel-softmax for improved action recognition in videos.
Contribution
It proposes a novel HM-AN model integrating hierarchical RNNs and attention with a new gradient estimation method, enhancing action recognition performance.
Findings
HM-AN outperforms LSTM with attention on vision tasks.
The model effectively captures hierarchical temporal structures.
Attention regions and hierarchical features are visualized successfully.
Abstract
Recurrent Neural Networks (RNNs) have been widely used in natural language processing and computer vision. Among them, the Hierarchical Multi-scale RNN (HM-RNN), a kind of multi-scale hierarchical RNN proposed recently, can learn the hierarchical temporal structure from data automatically. In this paper, we extend the work to solve the computer vision task of action recognition. However, in sequence-to-sequence models like RNN, it is normally very hard to discover the relationships between inputs and outputs given static inputs. As a solution, attention mechanism could be applied to extract the relevant information from input thus facilitating the modeling of input-output relationships. Based on these considerations, we propose a novel attention network, namely Hierarchical Multi-scale Attention Network (HM-AN), by combining the HM-RNN and the attention mechanism and apply it to action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
