Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition
Anoop Cherian, Piotr Koniusz, Stephen Gould

TL;DR
This paper introduces Higher-order Kernel descriptors that leverage higher-order correlations of CNN classifier scores across video frames, improving action recognition accuracy by capturing temporal evolution more effectively.
Contribution
The paper proposes a novel higher-order pooling method using kernel linearization to generate descriptors from CNN scores, enhancing video-level action recognition.
Findings
Achieves state-of-the-art results on fine-grained action datasets
Demonstrates the effectiveness of higher-order correlations in video classification
Outperforms traditional first-order pooling methods
Abstract
Most successful deep learning algorithms for action recognition extend models designed for image-based tasks such as object recognition to video. Such extensions are typically trained for actions on single video frames or very short clips, and then their predictions from sliding-windows over the video sequence are pooled for recognizing the action at the sequence level. Usually this pooling step uses the first-order statistics of frame-level action predictions. In this paper, we explore the advantages of using higher-order correlations; specifically, we introduce Higher-order Kernel (HOK) descriptors generated from the late fusion of CNN classifier scores from all the frames in a sequence. To generate these descriptors, we use the idea of kernel linearization. Specifically, a similarity kernel matrix, which captures the temporal evolution of deep classifier scores, is first linearized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Gait Recognition and Analysis
