Harnessing the Deep Net Object Models for Enhancing Human Action Recognition
O. V. Ramana Murthy, Roland Goecke

TL;DR
This paper explores how incorporating object detection using deep neural networks can improve human action recognition accuracy, especially for static background objects, achieving state-of-the-art results on benchmark datasets.
Contribution
The study introduces a method that combines deep object detectors with feature encoding techniques to enhance action recognition performance.
Findings
Achieved state-of-the-art accuracy on HMDB51 dataset.
Improved recognition of actions involving static objects.
Demonstrated the effectiveness of multi-layer feature integration.
Abstract
In this study, the influence of objects is investigated in the scenario of human action recognition with large number of classes. We hypothesize that the objects the humans are interacting will have good say in determining the action being performed. Especially, if the objects are non-moving, such as objects appearing in the background, features such as spatio-temporal interest points, dense trajectories may fail to detect them. Hence we propose to detect objects using pre-trained object detectors in every frame statically. Trained Deep network models are used as object detectors. Information from different layers in conjunction with different encoding techniques is extensively studied to obtain the richest feature vectors. This technique is observed to yield state-of-the-art performance on HMDB51 and UCF101 datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
