TL;DR
This paper introduces an efficient two-stream deep learning model using Separable ConvLSTM and MobileNet for violence detection in surveillance videos, emphasizing computational efficiency and high accuracy.
Contribution
The work presents a novel two-stream architecture with SepConvLSTM and pre-trained MobileNet, optimized for violence detection with improved efficiency and accuracy.
Findings
Outperforms on RWF-2000 dataset by over 2% accuracy
Matches state-of-the-art on smaller datasets
Achieves better computational efficiency
Abstract
Automatically detecting violence from surveillance footage is a subset of activity recognition that deserves special attention because of its wide applicability in unmanned security monitoring systems, internet video filtration, etc. In this work, we propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet where one stream takes in background suppressed frames as inputs and other stream processes difference of adjacent frames. We employed simple and fast input pre-processing techniques that highlight the moving objects in the frames by suppressing non-moving backgrounds and capture the motion in-between frames. As violent actions are mostly characterized by body movements these inputs help produce discriminative features. SepConvLSTM is constructed by replacing convolution operation at each gate of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPointwise Convolution · Depthwise Convolution · Tanh Activation · Sigmoid Activation · ConvLSTM · Depthwise Separable Convolution · Long Short-Term Memory · Convolution
