Video Violence Recognition and Localization Using a Semi-Supervised Hard Attention Model
Hamid Mohammadi, Ehsan Nazerfard

TL;DR
This paper introduces a semi-supervised hard attention model using reinforcement learning for video violence detection, achieving state-of-the-art accuracy without requiring attention annotations.
Contribution
The study presents a novel semi-supervised hard attention mechanism with reinforcement learning that improves video violence detection accuracy and reduces annotation requirements.
Findings
Achieved 90.4% accuracy on RWF dataset.
Achieved 98.7% accuracy on Hockey dataset.
Outperformed existing models with a semi-supervised approach.
Abstract
The significant growth of surveillance camera networks necessitates scalable AI solutions to efficiently analyze the large amount of video data produced by these networks. As a typical analysis performed on surveillance footage, video violence detection has recently received considerable attention. The majority of research has focused on improving existing methods using supervised methods, with little, if any, attention to the semi-supervised learning approaches. In this study, a reinforcement learning model is introduced that can outperform existing models through a semi-supervised approach. The main novelty of the proposed method lies in the introduction of a semi-supervised hard attention mechanism. Using hard attention, the essential regions of videos are identified and separated from the non-informative parts of the data. A model's accuracy is improved by removing redundant data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition
