Human Action Recognition using Local Two-Stream Convolution Neural Network Features and Support Vector Machines
David Torpey, Turgay Celik

TL;DR
This paper introduces a human action recognition method that combines local appearance and motion features extracted via 3D CNNs, then classifies actions with an SVM, demonstrating improved accuracy on benchmark datasets.
Contribution
It presents a novel approach that integrates local features from 3D CNNs with SVM classification and simple preprocessing techniques for enhanced action recognition.
Findings
SVM improves classification accuracy over other classifiers.
Preprocessing techniques like optical flow scaling and crop filling boost performance.
Method achieves competitive results on benchmark datasets.
Abstract
This paper proposes a simple yet effective method for human action recognition in video. The proposed method separately extracts local appearance and motion features using state-of-the-art three-dimensional convolutional neural networks from sampled snippets of a video. These local features are then concatenated to form global representations which are then used to train a linear SVM to perform the action classification using full context of the video, as partial context as used in previous works. The videos undergo two simple proposed preprocessing techniques, optical flow scaling and crop filling. We perform an extensive evaluation on three common benchmark dataset to empirically show the benefit of the SVM, and the two preprocessing steps.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsSupport Vector Machine
