Deep Adaptive Temporal Pooling for Activity Recognition
Sibo Song, Ngai-Man Cheung, Vijay Chandrasekhar, Bappaditya Mandal

TL;DR
This paper introduces Deep Adaptive Temporal Pooling (DATP), a learnable module that enhances activity recognition by adaptively weighting video segments based on their importance, improving accuracy without extra supervision.
Contribution
The paper presents DATP, a novel differentiable self-attention based pooling method that improves long-term temporal modeling in activity recognition tasks.
Findings
DATP improves accuracy on UCF101, HMDB51, and Kinetics datasets.
DATP learns to focus on key video segments during training.
DATP enhances training of frame-level feature extractors.
Abstract
Deep neural networks have recently achieved competitive accuracy for human activity recognition. However, there is room for improvement, especially in modeling long-term temporal importance and determining the activity relevance of different temporal segments in a video. To address this problem, we propose a learnable and differentiable module: Deep Adaptive Temporal Pooling (DATP). DATP applies a self-attention mechanism to adaptively pool the classification scores of different video segments. Specifically, using frame-level features, DATP regresses importance of different temporal segments and generates weights for them. Remarkably, DATP is trained using only the video-level label. There is no need of additional supervision except video-level activity class label. We conduct extensive experiments to investigate various input features and different weight models. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
