Beyond the Benchmark: Detecting Diverse Anomalies in Videos
Yoav Arad, Michael Werman

TL;DR
This paper introduces new datasets and a novel multi-frame anomaly detection method to identify diverse and complex anomalies in videos, surpassing traditional single-frame benchmark limitations.
Contribution
The paper presents HMDB-AD and HMDB-Violence datasets for diverse anomalies and proposes MFAD, a multi-frame detection method enhancing video anomaly detection capabilities.
Findings
MFAD outperforms existing models on new anomaly types
Datasets challenge models with complex, action-based anomalies
Experimental results validate the effectiveness of multi-frame features
Abstract
Video Anomaly Detection (VAD) plays a crucial role in modern surveillance systems, aiming to identify various anomalies in real-world situations. However, current benchmark datasets predominantly emphasize simple, single-frame anomalies such as novel object detection. This narrow focus restricts the advancement of VAD models. In this research, we advocate for an expansion of VAD investigations to encompass intricate anomalies that extend beyond conventional benchmark boundaries. To facilitate this, we introduce two datasets, HMDB-AD and HMDB-Violence, to challenge models with diverse action-based anomalies. These datasets are derived from the HMDB51 action recognition dataset. We further present Multi-Frame Anomaly Detection (MFAD), a novel method built upon the AI-VAD framework. AI-VAD utilizes single-frame features such as pose estimation and deep image encoding, and two-frame…
Peer Reviews
Decision·Submitted to ICLR 2024
Strengths of the MFAD approach: ++ Comprehensive Feature Extraction: MFAD extracts four diverse feature types, including object velocities, human pose estimations, deep image encodings, and deep video encodings, enabling a holistic analysis of video data. ++ Adaptive Density Score Calculation: Using Gaussian Mixture Models (GMM) for velocity features and k-nearest neighbors (kNN) for other high-dimensional features, it adapts the density score calculation to the nature of the features, enhanci
Looking at the manuscript, weaknesses are provided below. -- Complexity: MFAD's multi-stage process and diverse feature extraction can make it computationally demanding and challenging to implement in resource-constrained environments. -- Model Specificity: Utilizing specific video foundation models may reduce adaptability to different datasets or domains. -- Not Real-Time: Computationally intensive and a requirement for separate training/testing data make real-time application challenging.
(1) The paper addresses the limitation of current benchmark datasets for video anomaly detection and proposes two new datasets that allow for the detection of complex action-based anomalies. This expands the scope of what constitutes an anomaly and encourages further research on more comprehensive anomaly types. (2) The proposed method, MFAD, simply incorporates deep video encoding features and logistic regression to effectively detect both simple and complex anomalies. The experimental results
(1) The paper lacks a more detailed description of the datasets HMDB-AD and HMDB-Violence. It would be beneficial to provide more information on the distribution of normal and abnormal activities, and any specific challenges or characteristics of the datasets. (2) The method proposed in this paper is more like a simple patchwork combination that lacks sound and rigorous theoretical support. Moreover, the paper lacks a more detailed and visual explanation of the proposed method. (3) The article l
-Good overview of existing methods and datasets used in anomaly detection. -"Introducing" new videos to video anomaly detection for benchmarking. -Comprehensive and competitive results on the public benchmarks and significantly higher results compared to state-of-the-art on the two subsets of HMDB51. -Proper ablation study. which also shows the effect of video encoding features in Table 4.
Despite the interesting results, the paper's method sounds like a simple extension of [1] by introducing temporal features to [1]. Though the paper has cherry picked videos from HMDB51 and suggests using them for anomaly detection, they need claim this as their data (Table 1), which is not correct. The abolition study shows that video encoder features alone are producing almost similar results with the entire set of features on the subsets of HMDB51, so what is the point in inclusion of other fe
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Network Security and Intrusion Detection
MethodsFocus · Logistic Regression
