Multi-kernel learning of deep convolutional features for action recognition
Biswa Sengupta, Yu Qian

TL;DR
This paper introduces a multi-kernel learning approach combining multi-stream deep CNNs and support vector machines for action recognition, achieving near state-of-the-art results on a challenging dataset.
Contribution
It proposes a novel architecture called pillar networks that integrates deep CNNs with multi-kernel SVMs and hand-crafted features for improved video action recognition.
Findings
Achieved close to state-of-the-art performance on HMDB-51 dataset.
Demonstrated the effectiveness of combining deep features with hand-crafted features.
Showed robustness across diverse video conditions.
Abstract
Image understanding using deep convolutional network has reached human-level performance, yet a closely related problem of video understanding especially, action recognition has not reached the requisite level of maturity. We combine multi-kernels based support-vector-machines (SVM) with a multi-stream deep convolutional neural network to achieve close to state-of-the-art performance on a 51-class activity recognition problem (HMDB-51 dataset); this specific dataset has proved to be particularly challenging for deep neural networks due to the heterogeneity in camera viewpoints, video quality, etc. The resulting architecture is named pillar networks as each (very) deep neural network acts as a pillar for the hierarchical classifiers. In addition, we illustrate that hand-crafted features such as improved dense trajectories (iDT) and Multi-skip Feature Stacking (MIFS), as additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
