Action Recognition with Deep Multiple Aggregation Networks
Ahmed Mazari, Hichem Sahbi

TL;DR
This paper introduces a hierarchical pooling method for action recognition that captures multiple temporal granularities, improving classification by better modeling temporal dynamics in videos.
Contribution
It proposes a novel tree-structured hierarchical pooling approach that adaptively combines different temporal levels, addressing limitations of traditional max or average pooling.
Findings
Improved accuracy on UCF-101, HMDB-51, and JHMDB-21 datasets.
Hierarchical pooling captures diverse temporal granularities effectively.
Method is video-length and resolution agnostic.
Abstract
Most of the current action recognition algorithms are based on deep networks which stack multiple convolutional, pooling and fully connected layers. While convolutional and fully connected operations have been widely studied in the literature, the design of pooling operations that handle action recognition, with different sources of temporal granularity in action categories, has comparatively received less attention, and existing solutions rely mainly on max or averaging operations. The latter are clearly powerless to fully exhibit the actual temporal granularity of action categories and thereby constitute a bottleneck in classification performances. In this paper, we introduce a novel hierarchical pooling design that captures different levels of temporal granularity in action recognition. Our design principle is coarse-to-fine and achieved using a tree-structured network; as we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
