Learning to Detect Violent Videos using Convolutional Long Short-Term Memory
Swathikiran Sudhakaran, Oswald Lanz

TL;DR
This paper introduces a deep neural network combining CNNs and convolutional LSTMs to automatically detect violence in videos by capturing local spatio-temporal features, evaluated on benchmark datasets.
Contribution
It proposes a novel deep learning architecture that integrates CNNs with convolutional LSTMs and uses frame differences to improve violence detection in videos.
Findings
Achieved high recognition accuracy on benchmark datasets.
Outperformed existing state-of-the-art methods.
Effectively captured local motion features in videos.
Abstract
Developing a technique for the automatic analysis of surveillance videos in order to identify the presence of violence is of broad interest. In this work, we propose a deep neural network for the purpose of recognizing violent videos. A convolutional neural network is used to extract frame level features from a video. The frame level features are then aggregated using a variant of the long short term memory that uses convolutional gates. The convolutional neural network along with the convolutional long short term memory is capable of capturing localized spatio-temporal features which enables the analysis of local motion taking place in the video. We also propose to use adjacent frame differences as the input to the model thereby forcing it to encode the changes occurring in the video. The performance of the proposed feature extraction pipeline is evaluated on three standard benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
