Streamlining Video Analysis for Efficient Violence Detection
Gourang Pathak, Abhay Kumar, Sannidhya Rawat, Shikha Gupta

TL;DR
This paper introduces X3D, a 3D CNN-based model with pre-processing and clustering techniques, to improve automated violence detection in surveillance videos, enhancing security and content filtering applications.
Contribution
The paper presents a novel 3D CNN approach with specialized pre-processing and clustering for more accurate violence detection in videos.
Findings
Effective scene classification between fight and non-fight
Improved localization of violent scenes
Demonstrated robustness across diverse video datasets
Abstract
This paper addresses the challenge of automated violence detection in video frames captured by surveillance cameras, specifically focusing on classifying scenes as "fight" or "non-fight." This task is critical for enhancing unmanned security systems, online content filtering, and related applications. We propose an approach using a 3D Convolutional Neural Network (3D CNN)-based model named X3D to tackle this problem. Our approach incorporates pre-processing steps such as tube extraction, volume cropping, and frame aggregation, combined with clustering techniques, to accurately localize and classify fight scenes. Extensive experimentation demonstrates the effectiveness of our method in distinguishing violent from non-violent events, providing valuable insights for advancing practical violence detection systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
