Learnable pooling with Context Gating for video classification
Antoine Miech, Ivan Laptev, Josef Sivic

TL;DR
This paper proposes a learnable pooling method called Context Gating for improved video classification, combining audio-visual feature aggregation with interdependency modeling, outperforming existing methods on a large-scale dataset.
Contribution
It introduces a novel Context Gating mechanism and clustering-based aggregation layers for better temporal feature modeling in video analysis.
Findings
Outperforms existing methods on Youtube-8M v2 dataset
Demonstrates effectiveness of Context Gating in modeling feature interdependencies
Shows benefits of clustering-based aggregation for video representation
Abstract
Current methods for video analysis often extract frame-level features using pre-trained convolutional neural networks (CNNs). Such features are then aggregated over time e.g., by simple temporal averaging or more sophisticated recurrent neural networks such as long short-term memory (LSTM) or gated recurrent units (GRU). In this work we revise existing video representations and study alternative methods for temporal aggregation. We first explore clustering-based aggregation layers and propose a two-stream architecture aggregating audio and visual features. We then introduce a learnable non-linear unit, named Context Gating, aiming to model interdependencies among network activations. Our experimental results show the advantage of both improvements for the task of video classification. In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques
