Anomaly Detection in Road Traffic Using Visual Surveillance: A Survey
Santhosh Kelathodi Kumaran, Debi Prosad Dogra, Partha Pratim Roy

TL;DR
This survey reviews recent advances in anomaly detection in road traffic using visual surveillance, emphasizing learning methods, features, and scenarios with static cameras, and discusses future challenges and directions.
Contribution
It provides a comprehensive overview of the last six years of research on anomaly detection in road traffic, focusing on learning techniques and visual features.
Findings
Summarizes key learning methods used in anomaly detection.
Highlights important features and scenarios in recent research.
Discusses challenges and future research directions.
Abstract
Computer vision has evolved in the last decade as a key technology for numerous applications replacing human supervision. In this paper, we present a survey on relevant visual surveillance related researches for anomaly detection in public places, focusing primarily on roads. Firstly, we revisit the surveys done in the last 10 years in this field. Since the underlying building block of a typical anomaly detection is learning, we emphasize more on learning methods applied on video scenes. We then summarize the important contributions made during last six years on anomaly detection primarily focusing on features, underlying techniques, applied scenarios and types of anomalies using single static camera. Finally, we discuss the challenges in the computer vision related anomaly detection techniques and some of the important future possibilities.
| Ref. | Focus | Explored research areas |
|---|---|---|
| Morris (2008) [124] | Video trajectory-based scene analysis | Scene modeling: Tracking, interest point study, activity path learning; Applications: People movement, traffic, parking lot, and entity interaction; Path learning: Preprocessing (normalization and dimensionality reduction), clustering approaches and used distance measures, path modeling, relevance of path feedback in low level systems; Activity analysis: Virtual fencing, speed profiling, path classification, abnormality detection, online activity analysis, object interaction characterization. |
| Tian (2011) [176] | Video processing techniques applied for traffic monitoring | Traffic parameters collection; Traffic incident detection; Vehicle detection scenarios: Background modeling and non-background modeling approaches, shadow detection and removal; Vehicle tracking, model-based classification, region, deformable template and feature study, tracking algorithms; Traffic incident detection and behavior understanding. |
| Buch (2011) [26]) | Video analytics system for urban traffic | Applications: Vehicle counting, automatic number plate recognition, incident detection; Analytics system components; Foreground segmentation techniques: Frame differencing, background subtraction (averaging, single Gaussian, mode estimation, Kalman filter, wavelets), GMM, graph cuts, shadow removal, object-based segmentation; Top-down vehicle classification: Features (region based, contour based), machine learning techniques; Bottom-up approaches: Interest point descriptors, object classification; Tracking: Kalman filter, PF, S-T MRF, graph correspondence, event cones; Traffic analytic system: Urban (camera domain, three dimensional modeling), highways (detection and classification). |
| Sodemann (2012) [164] | Anomaly detection | Study on sensors: Visible-spectrum camera (low-level feature extraction and object level feature extraction), audio and infrared sensors; Learning methods: Unsupervised, supervised and apriori modeling; Classification algorithms: Dynamic bayesian networks, bayesian topic models, artificial neural networks, clustering, decision trees, fuzzy reasoning. |
| Sivaraman (2013) [162] | Vision-based vehicle detection, tracking and behavior analysis | Sensors: radar, lidar, camera; Vehicle detection: Monocular vision (camera placement, appearance features and classification, motion based approaches, vehicle pose). Stereo vision (matching, motion-based approaches); Vehicle tracking: Monocular and stereo tracking, vision cue fusion, real-time challenges and system architecture, fusion with other modalities; Behavior analysis: context, vehicle maneuvers, trajectories, behavioral classification; Future direction of vehicle detection, tracking, their on-road behavior and public benchmarks. |
| Wang (2013) [194] | Multi-camera based surveillance | Multi-camera calibration; Topology computation; Multi-camera object tracking: Calibration, appearance cues, correspondence-based methods; Object re-identification: Feature studies, learning methods; Multi-camera activity analysis: Correspondence free methods, activity models, human action recognition; Cooperative video surveillance using active and static cameras; Background modeling and object tracking with active cameras. |
| Suriani (2013) [171] | Abrupt event detection | Human centered, vehicle centered and small area centered studies; Methods of detection: Single person, multiple person, vehicles, multi-view camera based. |
| Loce (2013) [144] | Traffic management | Vehicle mounted camera-based safety applications: Lane departure warning and lane change assistance, pedestrian detection, driver monitoring, adaptive warning systems; Efficiency studies: Traffic flow management, incident management, video based tolling; Security management: Alert and warning systems, traffic surveillance, recognizing and tracking vehicles of interest; Law enforcement: Studies on speed enforcement, violation detection at road intersections, vehicle mounted mobile camera based vehicle identification. |
| Vishwakarma (2013) [181] | Human activity recognition and behavior analysis | Application areas: Behavioral biometrics, content-based video analysis, security and surveillance, interactive applications, animation and synthesis; Object detection methods: Motion segmentation methods (background subtraction based, statistical, temporal differencing and optical flow-based) and object classification; Object tracking methods (region, contour, feature, model, hybrid and optical flow-based); Action recognition techniques: Hierarchical (statistical, syntactic and description based) and non-hierarchical approaches; Human behavior understanding: Supervised, semi-supervised and unsupervised models; Dataset description: Controlled and realistic environments and its realistic impact on video-based surveillance market. |
| Borges (2013) [25] | Human behavior analysis | Human detection methods: Appearance, motion and hybrid approaches; Action recognition approaches: Low-level and spatio-temporal interest points, mid and high-level, silhouettes features; Interaction recognition: One-to-one, group interactions, models; Datasets. |
| Liu (2013) [105] | Intelligent video systems and analytics | Video systems: Architecture (distributed/centralized), quality diagnosis, system adaptability (configuration, calibration, capability and scalability) analysis, data management and transmission methods; Analytics: Object attributes, motion pattern recognition, event and behavior analysis; Analytic methods: Intelligence and cooperative aspects, multi-camera view selections, statistical and networked analysis, learning and classification, 3-D sensing; Applications areas: Management, traffic control, transportation, intelligent vehicles, health-care, life sciences, security and military. |
| Zablocki (2013) [213] | Characteristics of intelligent video surveillance systems | System classification: Object detection, tracking and movement analysis technologies; Anomaly detection, identification and warning/alarming systems; Vehicle detection, traffic and parking lot analysis systems; Object counting systems; Integrated camera view handling systems; Privacy preserving systems; Cloud-based systems. |
| Tian (2015) [175] | Vehicle surveillance | Dynamic and static attribute extraction: Appearance and motion-based detection, tracking, recognition (license plate, type, color and logo), networked tracking of vehicles; Behavior understanding: Single camera study, trajectory (clustering, modeling and retrieval) and networked multi-camera-based, interesting region discovery; Image acquisition: Traffic scene characteristics, imaging technologies; ITS service study: Illegal activity and anomaly detection, security monitoring, electronic toll collection, traffic flow analysis, transportation planning and road construction, environment impact assessment. |
| Patil (2016) [140] | Video datasets for anomaly detection | Dataset classification: Traffic, subway, panic driven, pedestrian, abnormal activity, campus, train, sea, crowd. |
| Datondji (2016) [41] | Traffic monitoring at intersections | Camera based classification: Mono vision, omni vision and stereo vision; Vehicle sensing: Methodologies and datasets; Challenges: Initialization and preprocessing, vehicle detection and tracking; Vehicle detection methods: Candidate localization, verification; Vehicle tracking: Representation and tracking approaches: Region, contour, feature and model-based; Vehicle tracking algorithms: Matching, Bayesian; Challenges for intersection; Monitoring systems: Monocular vision and omni-directional vision-based, in-vehicle monitoring; Vehicle tracking: Roadside monitoring systems, in-vehicle monitoring systems; Vehicle behavior analysis. |
| Li (2017) [101] | Spatio-temporal interest point (STIP) detection algorithms | STIPs algorithms; Detection challenges; Applications: Human activity detection, anomaly detection, video summarization and content based video retrieval. |
| Shirazi (2017) [158] | Intersections analysis from safety perspective | Vehicular behavior: Trajectories, vehicle speed, acceleration, turn recognition; Driver behavior: Turning intention, aggression, perception reaction time; Pedestrian behavior: Motion prediction, waiting time, walking speed, crossing speed, and choices; Safety assessment: Gap analysis, threat, risk, conflict, accident; Intersection safety systems: Driver assistance systems (driver perception enhancement, action suggestion and human driver interface, advanced vehicle motion control delegation), infrastructure-based systems (roadside warning systems, dilemma zone protection systems, decision support systems). |
| Ahmed (2018) [8] | Trajectory-based analysis | Trajectory analysis: Datasets, extraction, representation, applications; Clustering algorithms; Event detection: Methods and learning procedures; Localization of abnormal events: Methods and learning procedures; Video summarization and synopsis generation. |
| Lopez-Fuentes (2018) [110] | Emergency management using computer vision | Emergency classification: Natural, human made (road accident, crowd related, weapon threat, drowning, injured person, falling person); Monitoring objective: Prevention, detection, response and understanding; Acquisition methods: Sensor location, sensor types, acquisition rate and sensor cost; Feature extraction algorithms: Color, shape and texture, temporal (wavelet, optical flow, background modeling and subtraction, tracking) and convolution features; Semantic information extraction using machine learning: Artificial neural networks, deep learning, support vector machines (SVMs), hidden markov models (HMMs), fuzzy logic. |
| Mabrouk (2018) [115] | Abnormal behavior recognition | Behavior representation; Anomalous behavior recognition methods: Modeling frameworks and classification methods, scene density and moving object interaction in crowded and uncrowded scenes; Performance evaluation: Datasets and metrics; Existing surveillance systems. |
| Learning Method | Method | Applied context |
|---|---|---|
| Supervised | Hidden Markov Model (HMM) [17] | A supervised statistical Markov model where the system modeled is assumed to be a Markov process with hidden states: Used for anomaly detection in [20, 189]. |
| Support Vector Machine (SVM) [61] | A representation of data points in space, mapped such that separate categories are divided by a clear separation between them: A special class of SVM, namely One class SVM (OCSVM) has been extensively for anomaly detection [157]. | |
| Gaussian Regression (GR) [147] | A generic supervised learning method designed to solve regression and probabilistic classification problems: Used in [34, 153] for anomaly detection from videos. | |
| Convolutional Neural Networks (CNN) [54] | A class of deep neural networks, applied usually to analyze visual imagery: Due to its applicability in extracting semantic level features from the input, it has become popular in many applications including anomaly detection [68, 118]. | |
| Multiple Instance Learning (MIL) [13] | A special learning framework which deals with uncertainty of instance labels: Instead of receiving a set of instances which are individually labeled, the learner receives a set of labeled bags, each containing many instances. If all the instances in it are negative, the bag may be labeled negative. If there is at least one positive instance, the bag is labeled positive. It has been used for anomaly detection in [207, 168]. | |
| Long short-term memory (LSTM) networks [63] | A special kind of recurrent neural network (RNN) used in time series applications: In [113, 112, 118, 166], it has been used for anomaly detection. | |
| Fast Region-based-CNN (Fast R-CNN) [53] | A higher variation of neural deep neural networks (DNN) that works efficiently in object classification over conventional CNNs: Used for anomaly detection in [62]. | |
| Unsupervised | Latent Dirichlet Allocation (LDA) [23] | A topic model using statistical analysis to retrieve underlying topic distribution of in documents: Used for modeling visual words of videos for anomaly detection [73]. |
| Probabilistic Latent Semantic Analysis (pLSA) [64] | A model for representing co-occurrence information under a probabilistic framework: Used in [84] for anomaly detection. | |
| Hierarchical Dirichlet Process (HDP) [174] | A nonparametric Bayesian approach, built based on LDA, to cluster data: Used in data modeling and anomaly detection [78]. | |
| Gaussian Mixture Model (GMM) [19] | A probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters: Used for anomaly detection in [99, 200]. | |
| Density-based spatial clustering of applications with noise (DBSCAN) [48] | A density based non-parametric clustering algorithm used extensively for modeling and learning data patterns: Used for anomaly detection in [145]. | |
| Fisher kernel method [142] | A function to measure similarity of two objects on the basis of sets of measurements for each object and a statistical model: Used to obtain trajectory feature representation in [186]. | |
| Principal component analysis (PCA) [75] | A statistical procedure of orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables: Used for dimensionality reduction in [187]. | |
| Particle Swarm Optimization [85] | A population based stochastic optimization technique: Used in [77] to obtain optimized motion descriptor from a set of particles having individual motion characteristics. | |
| Generative Adversarial networks (GAN) [55] | A class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks (generator and discriminator) contesting with each other in a zero-sum game framework: Used for anomaly detection in [148]. | |
| Hybrid | HDP+HMM | A hybrid model: Used for representing sub-trajectories in [207] for anomaly detection using MIL. |
| GAN-LSTM [92] | A hybrid model: Fake frames required for adversarial learning used in [119] are generated using bidirectional Conv-LSTM [204]. | |
| CNN-LSTM [119] | A hybrid model: Prediction-based anomaly detection with the help of CNN-LSTM. |
| Specific Techniques | Ref. |
|---|---|
| SVM | [163, 143, 2, 190, 70, 16, 160] |
| Sparse | [208, 21, 111, 113, 185, 149] |
| PCA | [96] |
| Autoenoder | [161, 59, 37, 150] |
| Regression | [187, 153] |
| Density-based | [50, 114] |
| Clustering-based | [51, 131, 179] |
| Statistical methods | [72, 35, 99] |
| Prediction | [20, 118, 88, 10] |
| Bayesian Networks | [71] |
| Fuzzy logic-based | [201, 98] |
| Hybrid | [207, 150, 107, 206, 123, 62, 177, 5, 66] |
| Relative Density | [107] |
| Heuristic | [199, 30, 211, 93, 167, 117, 69] |
| Ref. | Features | Learning | Anomaly Criteria | Highlight |
|---|---|---|---|---|
| Yang (2013) [207] | Sub-trajectories | Multi instance learning | Nearest neighborhood based approach with Hausdorff distance-based threshold for anomaly detection. | Sub-trajectories-based local anomaly detection capability. |
| Roshtkhari (2013) [152] | 3D Spatio-temporal volume | Code-book model | Threshold applied on likelihood/saliency map. | Fast anomaly localization requiring less training data. Does not require any feature analysis, background/foreground segmentation and tracking, and can be applied for real-time applications. |
| Li (2014) [96] | MDTs from Spatio-temporal patches | Dynamic Texture Model | Threshold on negative log-likelihood on temporal mixture of dynamic textures for temporal anomaly and threshold on the saliency for spatial anomalies. | Detection of both temporal and spatial anomaly detection capability complex crowded scene. |
| Kaltsa (2014) [77] | HOSA+HOGs over image patches | SVM | OCSVM based anomaly detection. | Robustness to local noise and anomaly detection detection in crowded scene. |
| Jeong (2014)[73] | Trajectories and pixel velocities | Hybrid (LDA + GMM) | Threshold on the probability score. | Thorough study conducted on at intersections and roads for traffic pattern analysis. |
| Zhu (2014) [222] | Histogram of optical flow features | Sparse coding | Threshold on reconstruction cost used as anomaly measure. | The method can detect both local and global anomalies. Experiments though not conducted on traffic junctions though could be suitable for busy junctions. |
| Kaltsa (2015) [76] | Hybrid (HOS + HOG + PSO) | SVM | Support Vector Data Description (SVDD) method [173] for anomaly detection. | Swarm intelligence is exploited for the extraction of robust motion and appearance features to model and to detect anomalies. |
| Maousavi (2015) [126] | Histogram of Oriented Tracklets (HOT) | LDA | Log-likelihood based fixed threshold of visual words for anomaly detection. | Comprehensive evaluation using topic model based anomaly detection and localization for a wide range of real-world videos. |
| Cheng (2015) [34] | Spatio-temporal interest points (STIPs) [43] | Gaussian regression | Local anomalies: k-NN-based likelihood threshold with respect to the visual vocabulary of STIP codebook. Global anomalies: Using global negative log likelihood threshold. | STIPS effectively used for local and global anomaly detection. |
| Mendel (2016) [118] | Automatic videos features with CNN. | Conv-LSTM | Reconstruction error between predicted and actual output. | Effective for recognizing abnormalities when the training data is loosely supervised to contain mostly normal events. |
| Zhang (2016) [217] | Histogram of optical flow | Clustering | Anomaly score based on Hamming distance. | Locality sensitive hashing filters used in anomaly detection. |
| Lan (2016) [91] | HOG | Heuristic method | Anomalies detected using relative speeds of detected objects. | An interesting study about abandoned objects that could possibly cause traffic accidents or some other untoward incidents. |
| Hasan (2016) [59] | Handcrafted HOG+HOF [184] and automatic CNN extracted features | Dual Autoencoder model | Anomaly score, namely regularity score derived using reconstruction error in autoencoders. | A regularity score, used as a measure of normalcy in a scene, derived using both hand crafted features and automatic features using fully convolutional feed-forward autoencoder. |
| Hinami (2017)[62] | Deep features from CNN | Multi-test Fast R-CNN. | Anomaly detection with a combination of semantic features using (a)Nearest neighbor-based method (NN), (b)OCSVM and (c) KDE. | It addresses the problem of joint detection and recounting of abnormal events in videos in presence of false alarms. |
| Wen (2017) [200] | Object (velocity and direction) | GMM | Model based anomaly detection. | Speeding events detection that could be relevant on road, though authors have tested the method for indoor scenarios. |
| Ravanbakhsh (2017) [148] | Opticalflow frames + Normal frames | GAN | Anomaly score as a fusion of Optical-flow and appearance reconstruction error. | Global and Local anomaly detection in crowded scene. |
| Lin (2017) [104] | 3D-Tube | SVM | Contextual information embedded in trajectory thermal transfer fields using OCSVM. | This is first kind of anomaly detection done using thermal fields that can detect contextual anomalies. |
| Liu (2017) [108] | Automatically extracted optical flow, intensity and gradient features. | GAN | Peak Signal to Noise Ratio (PSNR) score based on optical flow, intensity, gradient loss. | DNN-based prediction ([151]) and GAN [3] based discriminator applied on optical flow frames derived using ([44]) to detect robustness to the uncertainty in normal events and the sensitivity to abnormal events. |
| Colque (2017) [38] | HOFME | Histogram based model | Nearest Neighbor threshold. | A new feature descriptor HOFME that could handle diverse anomaly scenarios as compared with conventional features. |
| Giannakeris (2018) [52] | Trajectory Fisher vector | SVM | Anomaly score derived from the Fisher vector using OCSVM. | Anomaly detection done using robust optical flow descriptors of the detected vehicles with the use of DNNs and Fisher vector representations from spatiotemporal visual volumes. |
| Lee (2018) [92] | Real and Fake frames | GAN | Abnormality score derived using the losses of the generator and the discriminator. | Can detect anomalies from dataset containing complex motion and frequent occlusions. |
| Kalta (2018) [78] | Code words of spatio-temporal regions | Multiple HDPs | Confidence score of reconstruction of region clips. | Both local and global anomaly detection using super-pixels and interest point tracking [6] applied on real-life videos. |
| Sultani (2018)[168] | Video clips | Deep MIL Ranking Model | An anomaly score using sparsity and smoothness constraints. | A generic method applied on a variety of real-life scenarios. |
| Ref. | Technique | Scene | Anomalies | Datasets |
|---|---|---|---|---|
| Yang (2013) [207] | Multi instance learning | Lobby. | One person walking, browsing, resting, slumping or fainting, leaving bags behind, people/groups meeting, walking together and then splitting up and two people fighting. | CAVIAR. |
| Roshtkhari (2013) [152] | Code-book (Sparse) model | Subway, walkway. | Abnormal walking patterns, crawling, jumping over objects, falling down, non-pedestrians on a walkway, walking in the wrong direction, irregular interactions between people and some other events including sudden stopping, running fast, walking in the wrong direction and loitering. | UCSC (Ped1, Ped2), Bellview and Person. |
| Jeong (2014) [73] | LDA + GMM | Junctions, walkway, roads, public gathering area. | Illegal U-turn, vehicle in opposite direction, disordering in the the traffic signal, over speed on a pavement, unusual crowds speed, a car stops on a railway. | UCSC, UMN, MIT, QMUL and In-house datasets. |
| Li (2014) [96] | Dynamic Texture model | Walkways, junction. | Non-pedestrian entities in the walkways, people walking across a walkway or in the surrounding grass, U-turn. | UCSD (Ped1, Ped2), U-turn and UMN. |
| Mo (2014) [123] | Sparsity Model + OCSVM | Junction, road, parking lot. | Man suddenly falls on floor, vehicle almost hits a pedestrian, car violates the stop sign rule, car fails to yield to oncoming car while turning left, driver backs his car in front of stop sign. | i-LIDS, CAVIAR and In-house dataset namely XEROX. |
| Patino (2014) [141] | Statistical with heuristic approach | Parking lot, road intersection. | Unusual object trajectories such as U-turn, vehicle stopping at pedestrian way, person stopping between two lanes outside zebra passages, person crossing lanes outside zebra passages, loitering and vehicle/person staying at a place for longer duration. | ARENA, CAVIAR and MIT trajectory dataset. |
| Akos (2014) [10] | Hybrid (HMM + SVM + k-NN) | Intersection. | Collision, nearby passes. | NGSIM and AIRS. |
| Wang (2014) [192] | OCSVM | Walkway, public gathering place. | Local dispersion of crowds. | PETS2009 and UMN. |
| Yun (2014) [211] | Motion interaction field (MIF) symmetry model | Junction. | Accident detection. | Car accident. |
| Xia (2015) [202] | Low rank approximation on motion matrix created using optical flows. | Road, intersection. | Accident detection. | In-house dataset. |
| Cheng (2015) [34] | Gaussian regression | Road, walkways, subway, intersection. | Non pedestrians appearing in walkway, chase, fight, run together, traffic interruption, jaywalk, illegal u-turn, strange driving. | UCSD (Ped1), Behave and QMUL. |
| Xu (2015) [2] | Hybrid (DNN + Autoencoder + OCSVM) | Walkways. | Non pedestrians appearing in walkway. | UCSD(Ped1, Ped2). |
| Kaviani(2015) [84] | Hybrid (LDA+STC+pLSA+FSTM) | Roadways, Junctions. | Accident detection. | QMUL and In-house datasets. |
| Nguyen (2015) [134] | Bayesiean non-parametric | Junctions. | Street fight, loitering, truck-unusual stopping, big truck blocking camera. | MIT. |
| Pathak (2015)[138] | pLSA | Junction, highway, roadways. | Car stops after the stop-line, jaywalk, vehicle abruptly crossing the road. | ldiap, highway (In-house) and i-LIDS. |
| Medel (2015) [119] | ConvLSTM | Walkways, roadways. | People walking perpendicular. to the walkway, or off the walkway, movement of non-pedestrian entities and anomalous pedestrian motions, pedestrians walking off the walkway. | USCD (Ped1, Ped2) and Avenue. |
| Zhou (2016) [220] | CNN | Junction, walkways, dispersing crowd. | U-turn, unexpected presence of vehicles. | UCSD, UMN, and U-turn. |
| Zhang (2016) [216] | Hybrid (Histogram of Optical flow and Support Vector Data Description) | Walkways. | Non pedestrians on walkways. | UCSD ped1. |
| Xu (2017) [205] | OCSVM with SDAE features | Walkways. | Non pedestrians on walkways. | UCSD. |
| Vishnu (2017) [180] | Hybrid (MLR+DNN+vehiclecount) | Highway, Roadway, Junction. | Congestion detection, ambulance detection, accident detection. | In-house datasets. |
| Liu (2017) [108] | Heuristic | Roadways, walkways, junction. | Throwing objects, loitering and running, non pedestrians on walkways, presence of people at unexpected area of road. | Avenue, UCSD Ped1, UCSD Ped2 and ShanghaiTech. |
| Giannakeris (2018) [52] | SVM | Roadways. | Car crashes, stalled vehicles. | NVDIA CITY. |
| Chebiyyam (2017) [31] | Heuristic using SVM and Region Association Graph | Parking lot, walkways. | Object encircling a particular regions, target switching between two or more regions for a sustained period of time. | MIT Parking trajectory, Avenue and a Custom dataset. |
| Yun (2017) [212] | Sparse learning using motion interaction field [211] | Junction, roadways, public gathering area. | Car accidents, crowd riots, and uncontrolled fighting. | BEHAVE, UMN and Car accident. |
| Wang (2018) [186] | Sparse topic Model | Junction, Roadways. | Car deviating from normal Pattern, Conflicting patterns, Vehicle suddenly interrupting normal pattern, jaywalk, vehicle retrograde, pedestrian near collisions with vehicle. | i-LIDS and QMUL. |
| Kalta (2018) [78] | HDP | Intersections. | Jay walking, illegal U-turns, wrong vehicle direction, traffic break. | QMUL, ldiap and U-turn. |
| Sultani (2018)[168] | Deep MIL Ranking Model | Intersection, roadways, walkways. | Abuse, arrest, arson, assault, accident, burglary, fighting, robbery. | UMN, UCSC (Ped1, Ped2), Avenue, Subway, BOSS, Ab normal Crowd, and a set of Local datasets. |
| Type | Ref. |
|---|---|
| Online | [185, 190, 150, 187, 107, 199, 52, 206, 108, 155, 179, 208, 182, 200, 167, 38, 31, 120, 180, 160, 89, 37, 50, 62, 205, 91, 5, 216, 49, 139, 218, 172, 32, 220, 60, 51, 118, 2, 131, 76, 34, 210, 134, 203, 33, 177, 153, 88, 202, 146, 22, 154, 192, 123, 123, 10, 93, 222, 73, 66, 211, 95, 45, 103, 128, 70, 94, 90, 207, 152] |
| Soft-real time | [168, 161, 78, 99, 217, 138, 193, 84, 145, 16, 83, 111, 35] |
| Offline | [30, 104, 117] |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Anomaly Detection in Road Traffic Using Visual Surveillance: A Survey
Kelathodi Kumaran Santhosh, Debi Prosad Dogra, and Partha Pratim Roy K. K. Santhosh and D. P. Dogra are with School of Electrical Sciences, Indian Institute of Technology Bhubaneswar, Odisha, India e-mail: ([email protected], [email protected]).P. P. Roy is with the Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, India. e-mail:([email protected]).
Abstract
Computer vision has evolved in the last decade as a key technology for numerous applications replacing human supervision. In this paper, we present a survey on relevant visual surveillance related researches for anomaly detection in public places, focusing primarily on roads. Firstly, we revisit the surveys done in the last 10 years in this field. Since the underlying building block of a typical anomaly detection is learning, we emphasize more on learning methods applied on video scenes. We then summarize the important contributions made during last six years on anomaly detection primarily focusing on features, underlying techniques, applied scenarios and types of anomalies using single static camera. Finally, we discuss the challenges in the computer vision related anomaly detection techniques and some of the important future possibilities.
Index Terms:
Computer vision, Anomaly detection, Road traffic analysis, Learning methods.
I Introduction
With the widespread use of surveillance cameras in public places, computer vision-based scene understanding has gained a lot of popularity amongst the CV research community. Visual data contains rich information compared to other information sources such as GPS, mobile location, radar signals, etc. Thus, it can play a vital role in detecting/predicting congestions, accidents and other anomalies apart from collecting statistical information about the status of road traffic.
Several computer vision-based studies have been conducted focusing on data acquisition [175], feature extraction [80, 164], scene learning [124, 14, 67, 36], activity learning [181], behavioral understanding [162, 15], etc. These studies primarily discuss on aspects such as scene analysis, video processing techniques, anomaly detection methods, vehicle detection and tracking, multi camera-based techniques and challenges, activity recognition, traffic monitoring, human behavior analysis, emergency management, event detection, etc.
Anomaly detection is a sub-domain of behavior understanding [175] from surveillance scenes. Anomalies are typically aberrations of scene entities (vehicles, human or the environment) from the normal behavior. With the availability of video feeds from public places, there has been a surge in the research outputs on video analysis and anomaly detection [162, 164, 158, 115]. Typically anomaly detection methods learn the normal behavior via training. Anything deviating significantly from the normal behavior can be termed as anomalous. Vehicle presence on walkways, a sudden dispersal of people within a gathering, a person falling suddenly while walking, jaywalking, signal bypassing at a traffic junction, or U-turn of vehicles during red signals are a few examples of anomalies. Anomaly detection frameworks typically use unsupervised, semi-supervised or unsupervised learning. In this survey, we mainly explore anomaly detection techniques used in road traffic scenarios focusing on entities such as vehicles, pedestrian, environment and their interactions.
We have noted that scope of the study should cover the nature of input data and their representations, feasibility of supervised learning, types of anomalies, suitability of the techniques in application contexts, anomaly detection outputs and evaluation criteria. We present this survey from the above perspectives. A typical anomaly detection framework is presented in Fig. 1. Usually, anomaly detection systems work by learning the normal data patterns to build a normal profile. Once the normal patterns are learned, anomalies can be detected with the help of established approaches [137, 97]. Output of the system can be a score typically in the form of a metric or a label that notifies whether the data is anomalous or not.
Some examples of anomaly detection results are shown in Fig. 2.
I-A Recent Surveys
During last 10 years or so, a few interesting surveys have been published in this field of research. Authors of [124] have explored object detection, tracking, scene modeling and activity analysis using video trajectories. The study presented in [176] covers vehicle detection, tracking, behavior understanding and incident detection from the purview of intelligent transportation systems (ITS). Authors of [26] have conducted an in-depth study of traffic analysis frameworks under different taxonomies with pointers at integrating information from multiple sensors. The review presented in [164] is possibly the first work covering anomaly detection techniques. It covers sensors, entities, feature extraction methods, learning methods and scene modeling to detect anomalies. In [162], an object oriented approach from the perspective of vehicle mounted sensors for object detection, tracking and behavior analysis detailing the progress of the last decade of works, has been presented. Multi-camera study presented in [194] covers the researches related to surveillance in multi-camera setups. Authors of [171] discuss events, which are considered as a subset of anomalous events, requiring immediate attention, occuring unintentionally, abruptly and unexpectedly. The research presented in [144] discusses safety, security and law enforcement related applications from the computer vision perspective. The review presented in [181] discusses the elements of human activity and behavioral understanding frameworks. Authors of [25] present the researches on human behavioral understanding through actions and interactions of human entities. Intelligent video systems covering analytics aspect has been studied in [105]. Surveillance systems with specific application areas have been presented in [213]. Authors of [175] systematically divide road traffic analysis into four layers, namely image acquisition, dynamic and static attribute extraction, behavioral understanding and ITS services. Datasets used for anomaly detections have been covered in [140]. Traffic monitoring using different types of sensors has been discussed in [41]. Algorithms used for spatio-temporal point detections and their applications in vision domain have been covered in [101]. Traffic entities have been studied from the perspective of safety in [158]. Authors of [8] explore studies on video trajectory-based analysis and applications. Authors of [110] discuss various ways of handling emergency situations by assessing the risks, preparedness, response, recovery and mitigation using the extracted information from the visual features with the help of various learning mechanisms. In [115], authors have presented anomalous human behavior recognition work with focus on behavior representation and modeling, feature extraction techniques, classification and behavior modeling frameworks, performance evaluation techniques, and datasets with examples of video surveillance systems. Table I summarizes the major computer vision-based studies done during last 10 years. In our survey, we particularly focus on the studies on anomaly detection that are relevant on road traffic scenarios.
Anomalies are contextual in nature. The assumptions used in anomaly detections cannot be applied universally across different traffic scenarios. We analyze the capabilities of anomaly detection methods used in road traffic surveillance from the perspective of data. In the process, we categorize the methods according to scene representation, employed features, used models and approaches.
Rest of the paper is organized as follows. First, the background and the terminologies used in the paper are introduced in Section II-A. Anomaly detection related visual scene learning methods are presented in Section II-B. Anomaly detection approaches and classification are elaborated in Section II-C. Features used for anomaly detection and application areas are presented in Sections II-D and II-E, respectively. A critical analysis of the existing methods followed by discussions on the challenges and future possibilities of anomaly detection are presented in Section III. We conclude the paper in Section IV.
II Computer Vision Guided Anomaly Detection Studies
II-A Background and Terminologies
Features are assumed as data in the present context and are represented in the form of feature descriptors. Data typically occupy a position in a multi-dimensional space depending on the feature descriptor length.
Anomalies are data patterns that do not conform to a well-defined notion of normal behavior [29]. There has been other synonyms of anomalies such as outliers, novelty in various application areas [58]. In this paper, we use anomaly or outlier in the subsequent part.
II-A1 Anomaly Classification
Traditionally, anomalies are classified as point anomalies [152, 96, 73], contextual anomalies [165, 210] and collective anomalies [192, 34]. Data correspond to point anomaly if they are far away from the usual distribution. For example, a non-moving car on a busy road can be termed as a point anomaly. Contextual anomalies correspond to data that may be termed normal in a different context. For example, in a slow moving traffic, if a biker rides faster as compared to others, we may term it as anomaly. Conversely, in a less dense road it may be a normal behavior. A group of data instances together may cause anomaly even though individually they may be normal. For example, a group of people dispersing within a short span of time can be termed as collective anomaly.
In the context of visual surveillance, it is common to see anomalies classified as local and global anomalies [57, 68, 139, 207, 138, 154]. Global anomalies can be present in a frame or a segment of the video without specifying where exactly it has happened[57, 68, 139]. Local anomalies usually happen within in a specific area of the scene, but may be missed by global anomaly detection algorithms [207, 138, 154]. Some methods can detect both global and local anomalies[190, 5, 34, 78, 222].
II-A2 Challenges and Scope of Study
The key challenges in anomaly detection are: (i) defining a representative normal region, (ii) boundaries between the normal and anomalous regions may not be crisp or well defined, (iii) the notion of anomaly is not same in all application contexts, (iv) limited availability of data for training and validation, (v) data is often noisy due to inaccurate sensing, and (vi) normal behavior evolves over time.
We have done this survey based on the studies conducted on videos captured through a static camera. Anomaly detection using multiple cameras include additional challenges and the frameworks can be completely different [12, 57].
II-B Learning Methods
Learning the normal behavior is not only relevant for anomaly detection, but also for diverse use cases. Pattern analysis [47], classification [129], prediction [125], density estimation [4], and behavior analysis [15] are a few amongst them.
Learning methods can be classified as supervised, unsupervised or semi-supervised. In supervised learning, the normal profile is built using labeled data [79, 74, 81, 159]. It is typically applied for classification and regression related applications. In unsupervised learning, normal profile is structured from the relationships between elements of the unlabeled dataset [166]. Semi-supervised learning primarily uses unlabeled data with some supervision with a small amount of labeled data for specifying example classes known apriori [170, 106]. If learning happens through interactive labeling of data as and when the label info is available, such a learning is called active learning [179, 42, 109, 134]. Such methods are used when unlabeled data are abundant and manual labeling is expensive. Reinforcement learning, a relatively new learning applied on computer vision, is an area of machine learning concerned with how software agents (discriminant and generator) ought to take actions in an environment so as to maximize some notion of cumulative reward [195, 191, 215]. Some of the important works are summarized in Table II.
Learned models are not only been used in feature extraction, but also used in object detection [188], classification [82], activity recognition [130], segmentation [86], tracking [183], entity re-identification [102], object interaction analysis [209], anomaly detection [77], etc. Table III presents some important learning methods used in anomaly detection.
II-C Anomaly Detection Approaches
Anomaly detection approaches can be classified as depicted Fig. 3.
II-C1 Model-based
Model-based approaches learn the normal behavior of data by representing them in terms of a set of parameters. Statistical approaches are used in general to learn the parameters of the model as they try to fit the data into a stochastic model. Statistical approaches may be either parametric or non-parametric. Parametric methods assume that the normal data is generated through parametric distribution and probability density function. Examples are Gaussian mixture models [99], Regression models [34], etc. In nonparametric statistical models, the structure is not defined apriori, instead determined dynamically from the data. Examples are histogram-based [216], Dirichlet process mixture models (DPMM) [131], Bayesian network-based models [22], etc. Bayesian network estimates the posterior probability of observing a class label from a set of normal class labels and the anomaly class labels, given a test data instance. The class label with the biggest posterior is regarded as predicted class for the given test instance. Typically, topic model-based anomaly detection methods use Bayesian nonparametric approaches [126, 84]. DNN-based models can also be categorized under parametric models, where the parameters are the weights and biases of the neural networks [154, 28, 112]. However, some researchers consider them as a classification approaches [97], while many approaches (statistical, classification, information theoretic, reconstruction based) are used in the anomaly detection. Neural network-based methods also adopt information theoretic approach to reduce cross entropy between expected and the predicted outputs in the model learning [87]. Hence, it may be also categorized under hybrid approaches.
II-C2 Proximity-based
In proximity based approaches, anomalies are decided by how close they are to their neighbors. In distance-based approaches, the assumption is that normal data have dense neighborhood [38]. Density-based approaches compare the density around a point with the density around its local neighbors. The relative density of a point compared to its neighbors is computed as an outlier score [107].
II-C3 Classification-based
Classification based anomaly detection methods assume that a classifier can distinguish between normal and anomalous classes in a given feature space. Class-based anomaly detection techniques can be divided into two categories: one class and multi-class. Multi-class classification-based anomaly detection techniques assume that the training data contain labeled instances of normal and anomalous classes. A data point is assumed anomalous if it falls in the anomalous class [32]. One-class classification (OCC)-based anomaly detection techniques assume that all training data have one label [190, 192, 139, 205]. Such techniques learn a discriminative boundary around the normal instances using a one-class classification algorithm. Support Vector Machines (SVMs) can be used for anomaly detection in the one-class setting extensively in visual surveillance [29, 139]. Rule-based approaches learn rules that capture the normal behavior of a system [156]. A test instance that is not covered by any such rule, is considered as an anomaly.
II-C4 Prediction-based
Prediction-based approaches detect anomaly by calculating the variation between predicted and actual spatio-temporal characteristics of the feature descriptor [108]. HMM and LSTM models rely on such approaches for anomaly detection [20, 118, 119].
II-C5 Reconstruction-based
In reconstruction-based techniques, the assumption is, normal data can be embedded into a lower dimensional subspace in which normal instances and anomalies appear differently. Anomaly is measured based on the data reconstruction error. Some of the examples are, sparse coding [172, 218, 208], autoencoder [59], and principal component analysis (PCA)-based approaches [107].
II-C6 Other Approaches
There are two types of clustering approaches. One relies on an assumption that the normal data lie in a cluster, while anomaly data do not get associated with any cluster [145]. The later type is based on an assumption that normal data instances belongs to big and dense clusters, while anomalies either belong to little/small clusters. Fuzzy inference systems take a fuzzy data point and uses the rules related to membership and strength at which data point fires the rules to decide whether the data is anomalous or not [201, 98]. Heuristic methods intuitively decide about the feature values, spatial location, and contextual information to decide on anomalies. However, many practical systems do not entirely depend on one technology, rather hybrid approaches are used for anomaly detection [187, 33, 123]. Table IV presents the aforementioned categorization.
II-D Features Used in Anomaly Detection
As mentioned earlier, anomaly detection is essentially done by applying specific technique on the extracted feature. However, in visual surveillance, primary data is a video which is a sequence of frames. Hence, it is essential to extract the relevant features from the videos as these features become input to the specific technique used in anomaly detection. The choice of feature plays a key role in the capability of detecting specific anomalies. In some methods, preprocessing essentially involves extracting the foreground information and applying specific techniques for finding objects from the foreground [91, 96, 177, 199]. Also, histograms extracted from the pixel level features can become inputs to anomaly detection methods [192, 193, 217, 38]. Some methods use detected objects or object trajectories as inputs to the anomaly detection methods [51, 104, 221]. Deep neural networks (DNN) extract features automatically and used them for anomaly detection [182, 155, 92].
Feature are typically in the form of vectors, corresponding to the data. The method proposed in [59] uses histograms of oriented gradients (HOG), histograms of optical flows (HOF), improved trajectory features [184], and automatic features extracted using DNN. A mixture of dynamic textures has been used in [96]. Histograms of oriented swarm accelerations (HOSA) coupled with histograms of oriented gradients (HOGs) has been used in learning [77]. Authors of [104] have used 3D-tube representation of trajectories as features using the contextual proximity of neighboring trajectory for learning normal trajectory. In [52], Fisher vector corresponding to each trajectory obtained using optical flow of the object and its position, has been used. Histogram of optical flow and motion entropy (HOFME) have been used in [38]. In DNN-based systems, high level features are automatically extracted.
Broadly, the features can be classified as object oriented and non-object oriented. The classification is represented in Fig. 4. Using object oriented features, anomalies can be detected by extracting the objects [103, 89] or trajectories [104, 51, 121]. Objects or trajectories represented in the form of feature descriptors become the data for anomaly detection. In the latter approach, low-level descriptors for pixel or pixel group features, intensities, optical flows, or resultant features from spatio-temporal cubes (STC) [95, 128, 83, 100, 219, 146, 153] have been used for anomaly detection. Some methods use hybrid features for anomaly detections [94, 90, 45, 39]. Some of the important work using various aforementioned features are summarized in Table V.
II-E Applied Areas
In this section, we discuss the research work that have been carried out so far focusing on scene and datasets. Typical scenes are road segments, junctions, parking areas, highways, pedestrian paths, etc. A few of the important research work have been summarized in Table VI. We mainly highlight the underlying techniques, applicable scenes, anomaly types and datasets. The datasets often used in such work are QMUL [65], CAVIAR [1], UCSD [116], Bellview [214], Person [7], UMN [122], ARENA [141][check again], Avenue [60], U-turn [18], MIT Trajectory [198], MIT [197], MIT parking trajectory [196], NGSIM [133], AIRS [9], PETS2009 [46], Behave [24], i-LIDS [11], ShanghaiTech [113], NVDIA CITY [135], BOSS [168], Car Accident [169], and ldiap [178].
II-F Online vs. Offline
Majority of the techniques applied for anomaly detection focus on online usage [152, 91, 22, 153, 167, 125, 7]. Some methods [111, 83, 145, 84] can be termed near real-time because the detection can happen only by segmenting test videos from the real scene. Offline methods are also used in road networks though the results are not immediate especially for data analysis [104, 30, 117]. However, online methods are more preferred since they generate instantaneous results. A categorization is presented in Table VII.
III Critical Analysis
This discussion is purely in the context of visual surveillance. Though most of the papers discussed in this survey address anomaly detection, we have observed four key issues with these methods: (i) Benchmark dataset-based comparisons are used to show the effectiveness against the state-of-the-art [190, 148, 111, 205]. Though benchmarks may be relevant for comparisons, they may not contain all real-life situations. For example, though anomaly detection works fine on Avenue [60] dataset, it gives higher false alarms when applied on a real dataset QMUL [65] using two of the proposed methods [111, 37]. Therefore, we believe, the methods need to be relevant for real-life scenarios and should be applicable to long duration videos. (ii) Secondly, due to the aforementioned trend, very limited amount of research [168, 161, 32] have been carried out for developing generic techniques applicable to a variety of datasets. (iii) There has been hardly any illumination independent research [161, 211] except for accident-type anomaly detection. The problem is not entirely due to the limitations of the learning models. It is equally dependent on the dataset types and lack of illumination independent feature extraction. Possibly with the emergence of DNN-based modeling, we hope to address these issues in future. An object oriented approach might yield better results than histogram-based approaches as human do not think of pixels and their motion in detecting anomalies, but with mere object motion observations. Researchers can make datasets containing segments of the same scene at varying illumination conditions. (iv) Some approaches remove the background and focus on foreground features for anomaly detections [50, 91, 172]. We think, background information should not be ignored as anomalies also depend on environmental conditions. For example, chance of accidents on a rainy day is higher than that on a sunny day. Obstructions on roads due to various factors should be taken into consideration while preparing datasets. Very few work has happened on this front [40, 91].
III-A Challenges and Possibilities
Some of the stringent challenges on video-based anomaly detection are:
- •
Illumination: Even though a handful of anomaly detection methods have already been proposed, the number methods that can handle illumination variations, are limited [99, 84, 202]. This is due to the incapabilities of illuminations agnostic feature extraction from the videos. The criteria or methods used under different illumination conditions can be different for real-life applications.
- •
Pose and Perspective: Often camera angles focusing on the surveillance area can have substantial impact on the performance of anomaly detection as the appearance of vehicle may change depending on its distance from the cameras [175, 56, 127]. Though object detection accuracy has increased manifolds using deep neural network based methods, still there are challenges in tracking smaller objects. Humans can detect objects at different poses with ease, while machine learning may face difficulties in detecting and tracking the same object under pose variations.
- •
Heterogeneous object handling: Anomaly detection frameworks are largely based on modeling the scene and its entities [20, 189, 157, 34, 153, 68, 118, 207, 168, 73, 84]. However, modeling heterogeneous objects in a scene or learning the movement of heterogeneous objects in a scene can be difficult at times.
- •
Sparse vs. Dense: The methods used for detecting anomalies in sparse and dense conditions are different. Though some of the methods [111, 37] are good at locating anomalies in sparse condition, dense scene-based methods can generate many false negatives.
- •
Curtailed tracks: Since many anomaly detections are based on vehicle trajectories [8, 117, 20, 39, 207], underlying tracking algorithms are supposed to perform accurately. Even though tracking accuracies have increased in the last decade, many of the existing tracking algorithms do not work under different scenarios [136, 175]. Tracking under occlusion is also another challenge though humans can easily track them visually.
- •
Lack of real-life datasets: There is a need for real-life datasets to see the effectiveness of anomaly detection techniques.
There are ample scopes and requirements for anomaly detection research based on the gaps discussed earlier. With the advancements in machine learning techniques and affordable hardware, computer vision-based behavior analysis, anomaly detection and anomaly prediction can leapfrog in the coming years. Deep learning-based hybrid frameworks can handle diverse traffic scenarios. This can also help to build fully automatic traffic analysis frameworks capable of reporting events of interest to the stakeholders.
IV Conclusion
In this paper, we have revisited important computer vision-based survey papers. Then, we explored various anomaly detection techniques that can be applied for road network entities involving vehicles, people, and their interaction with the environment. We treat anomaly detection by taking data as the primary unit detailing the learning techniques, features used in learning, approaches employed for anomaly detection, and applied scenarios for anomaly detection. We intend to set a few future directions by looking into the gaps in the current computer vision-based techniques through discussions on various possibilities.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Video dataset. http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA 1/ , 2004. [Online; accessed 20-Jan-2004].
- 2[2] Learning deep representations of appearance and motion for anomalous event detection , 2015.
- 3[3] NIPS 2016 tutorial: Generative adversarial networks , 2016.
- 4[4] Understanding traffic density from large-scale web camera data , 2017.
- 5[5] A. C. B. Abdallah, M. Gouiffès, and L. Lacassagne. A modular system for global and local abnormal event detection and categorization in videos. Machine Vision and Applications , 27(4):463–481, 2016.
- 6[6] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Süsstrunk, et al. Slic superpixels compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence , 34(11):2274–2282, 2012.
- 7[7] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz. Robust real-time unusual event detection using multiple fixed-location monitors. IEEE transactions on pattern analysis and machine intelligence , 30(3):555–560, 2008.
- 8[8] S. A. Ahmed, D. P. Dogra, S. Kar, and P. P. Roy. Trajectory-based surveillance analysis: A survey. IEEE Transactions on Circuits and Systems for Video Technology , pages 1–1, 2018.
