Unsupervised Anomalous Trajectory Detection for Crowded Scenes

Deepan Das; Deepak Mishra

arXiv:1907.01717·cs.CV·July 4, 2019

Unsupervised Anomalous Trajectory Detection for Crowded Scenes

Deepan Das, Deepak Mishra

PDF

TL;DR

This paper introduces an unsupervised clustering-based algorithm for detecting anomalous trajectories in crowded scenes, utilizing trajectory extraction, feature analysis, mean-shift clustering, and entropy-based anomaly detection.

Contribution

It proposes a novel unsupervised method combining multiple features and clustering for effective anomaly detection in crowded scene videos.

Findings

01

Accurate detection of anomalous trajectories in diverse crowd scenes

02

Effective use of mean-shift clustering and Shannon entropy for anomaly identification

03

Robust performance across different crowd motion patterns

Abstract

We present an improved clustering based, unsupervised anomalous trajectory detection algorithm for crowded scenes. The proposed work is based on four major steps, namely, extraction of trajectories from crowded scene video, extraction of several features from these trajectories, independent mean-shift clustering and anomaly detection. First, the trajectories of all moving objects in a crowd are extracted using a multi feature video object tracker. These trajectories are then transformed into a set of feature spaces. Mean shift clustering is applied on these feature matrices to obtain distinct clusters, while a Shannon Entropy based anomaly detector identifies corresponding anomalies. In the final step, a voting mechanism identifies the trajectories that exhibit anomalous characteristics. The algorithm is tested on crowd scene videos from datasets. The videos represent various possible…

Tables2

Table 1. TABLE I: Results on several Crowded Scene Videos

Video	Precision	Recall	$f$ -Score	Accuracy
Crowded Subway Exit	0.8258	0.9944	0.9023	96.31%
Pilgrim Sequence	0.8221	0.9965	0.9009	98.15%
Intersection Sequence	0.7287	0.9971	0.842	98.68%

Table 2. TABLE II: Comparison of Results

Method	Accuracy
Guo et. al.	96%
Xu et. al.	87%
Biswas et. al.	96.7%
Proposed	98.68%

Equations25

n_{T, j, ϵ} = ∣ {T^{i} ∣\forall i \neq = j, d (f^{j}, f^{i}) < ϵ} ∣

n_{T, j, ϵ} = ∣ {T^{i} ∣\forall i \neq = j, d (f^{j}, f^{i}) < ϵ} ∣

F^{j} = [n_{j, ϵ 1}, n_{j, ϵ 2}, n_{j, ϵ 3}]

F^{j} = [n_{j, ϵ 1}, n_{j, ϵ 2}, n_{j, ϵ 3}]

D (τ_{1}, τ_{2}) ∣_{T} = \frac{\int d ( τ _{1} ( t ) , τ _{2} ( t )) d t}{∣ T ∣}

D (τ_{1}, τ_{2}) ∣_{T} = \frac{\int d ( τ _{1} ( t ) , τ _{2} ( t )) d t}{∣ T ∣}

x (t) = a_{0} + a_{1} t + a_{2} t^{2} + a_{3} t^{3}

x (t) = a_{0} + a_{1} t + a_{2} t^{2} + a_{3} t^{3}

y (t) = b_{0} + b_{1} t + b_{2} t^{2} + b_{3} t^{3}

f_{s} = [a_{0}, \dots, a_{3}, b_{0}, \dots, b_{3}]

f_{s} = [a_{0}, \dots, a_{3}, b_{0}, \dots, b_{3}]

σ = (E [(X - μ)^{2}])

σ = (E [(X - μ)^{2}])

\nabla f_{h, k} (x) = \frac{2 c _{k, d}}{n h ^{d + 2}} i = 1 \sum n (x_{i} - x) \overset{ˊ}{K} (∣∣ \frac{x - x _{i}}{h} ∣ ∣^{2})

\nabla f_{h, k} (x) = \frac{2 c _{k, d}}{n h ^{d + 2}} i = 1 \sum n (x_{i} - x) \overset{ˊ}{K} (∣∣ \frac{x - x _{i}}{h} ∣ ∣^{2})

\nabla f_{h, k} (x) = \frac{2 c _{k, d}}{n h ^{d + 2}} [i = 1 \sum n g (∣∣ \frac{x - x _{i}}{h} ∣ ∣^{2})] m_{h, G} (x)

\nabla f_{h, k} (x) = \frac{2 c _{k, d}}{n h ^{d + 2}} [i = 1 \sum n g (∣∣ \frac{x - x _{i}}{h} ∣ ∣^{2})] m_{h, G} (x)

m_{h, G} (x) = [\frac{\sum _{i = 1}^{n} x _{i} g ( ∣∣ \frac{x - x _{i}}{h} ∣ ∣ ^{2} )}{\sum _{i = 1}^{n} g ( ∣∣ \frac{x - x _{i}}{h} ∣ ∣ ^{2} )} - x]

m_{h, G} (x) = [\frac{\sum _{i = 1}^{n} x _{i} g ( ∣∣ \frac{x - x _{i}}{h} ∣ ∣ ^{2} )}{\sum _{i = 1}^{n} g ( ∣∣ \frac{x - x _{i}}{h} ∣ ∣ ^{2} )} - x]

p_{j, k} = \frac{d i s t an ce ( c _{k, i} , f _{j} )}{\sum _{m = 1}^{n} d i s t an ce ( c _{m, i} , f _{j} )}

p_{j, k} = \frac{d i s t an ce ( c _{k, i} , f _{j} )}{\sum _{m = 1}^{n} d i s t an ce ( c _{m, i} , f _{j} )}

H_{i} = - k = 1 \sum n p_{i, k} lo g_{a} p_{i, k}

H_{i} = - k = 1 \sum n p_{i, k} lo g_{a} p_{i, k}

H = [H_{1}, H_{2}, \dots, H_{n u m T r aj}]

H = [H_{1}, H_{2}, \dots, H_{n u m T r aj}]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Unsupervised Anomalous Trajectory Detection for Crowded Scenes

Deepan Das

Dept. of Electronics and Telecommunication Engineering

*Indian Institute of Engineering Science and Technology, Shibpur

*WB,India

[email protected]

Deepak Mishra

Dept. of Avionics

*Indian Institute of Space Science and Technology

*Thiruvananthapuram, India

[email protected]

Abstract

We present an improved clustering based, unsupervised anomalous trajectory detection algorithm for crowded scenes. The proposed work is based on four major steps, namely, extraction of trajectories from crowded scene video, extraction of several features from these trajectories, independent mean-shift clustering and anomaly detection. First, the trajectories of all moving objects in a crowd are extracted using a multi feature video object tracker. These trajectories are then transformed into a set of feature spaces. Mean shift clustering is applied on these feature matrices to obtain distinct clusters, while a Shannon Entropy based anomaly detector identifies corresponding anomalies. In the final step, a voting mechanism identifies the trajectories that exhibit anomalous characteristics. The algorithm is tested on crowd scene videos from datasets. The videos represent various possible crowd scenes with different motion patterns and the method performs well to detect the expected anomalous trajectories from the scene.

Index Terms:

Crowd, Anomaly Detection, Trajectory, Clustering, Entropy.

I Introduction

Computer Vision research aims to converge at human-like abilities to interpret and extract useful information regarding behavioural patterns and anomalies from a descriptive set of visual data. However, human abilities have glaring limitations when it comes to analyzing simultaneously changing signals[1]. A crowd presents itself as a considerably large collection of simultaneously changing parameters, characterized by usual dominant patterns and some observable abnormalities. Safety is the primary reason to understand crowd dynamics and isolate anomalous patterns. With crowd-related violent incidents on the rise, it is paramount that we expand our studies to analyze the intricate and complex nature of crowds. Understanding anomalies in a crowded scene enables better public space design and also allows better surveillance systems to be built. Earlier works like those of Kim et al.[2] used a Mixture of Probabilistic Principal Component Analyzers to learn patterns of local optical flow and then validate the consistency by Markov Random Field. Cong et al.[3] used a multi-scale histogram of Optical Flow as the feature descriptor and used it as the basis for a sparse reconstruction. Ali et al.[4] used Lagrangian Particle Dynamics to model coherent crowd flow as fluid flow.

In general, Supervised methods require a considerable amount of labeled data, which is directly utilized to build the connection between video features and video labels. Therefore, developing Unsupervised anomaly detection systems prove to be more challenging than supervised ones. An anomaly in a crowded scene can be determined from the motion patterns of it’s constituent pedestrians and objects. Analyzing trajectory data enables one to predict and identify anomalies with an excellent degree of accuracy. The early works on trajectory analysis includes that of Fu et al.[5] which proposed a hierarchical clustering framework to classify vehicle motion trajectories based on pairwise similarities, but with the limitation of using only a single feature for clustering. Progressing further, Anjum and Cavallaro[6] proposed the use of multiple features in a Mean shift clustering based framework. They could identify outliers using a basic mean trajectory location based measure. Antonini et al.[7] transformed the input trajectories using Independent Components Analysis and then use Euclidean distance to find similarities between various trajectories. The Shannon Entropy measure has presented itself as an excellent tool for many applications including video key selection[8], Network anomaly detection[9] and worm detection[10]. The principal contributions of this paper include the incorporation of a multi-feature object tracker that works excellently well for crowded scenes[11] and the use of multiple features for independent clustering. Furthermore, an information theory based Shannon entropy measure is proposed to detect anomalies for each cluster and then identify overall anomalous trajectories for the entire scene using a voting mechanism. The paper is organized as follows: Section II discusses the trajectory estimation and feature extraction procedure. Section III discusses the Clustering task with Section IV focusing on the Anomaly detection mechanism while Section V sheds light on the results obtained using the algorithm.

II Trajectory and Feature Extraction

The first task is to evaluate trajectory paths for all moving objects.

II-A Trajectory Extraction

The estimation of trajectories in crowded scenes is a challenging task due to various factors like high degree of occlusion, difficulty in tracking individual objects and arbitrary changes in nature of the motion. To tackle this problem, we incorporate the use of a multi-object tracker, that works exceedingly well in crowded scenes as demonstrated by Sharma et al.[11]. Using this approach, each frame is divided into non-overlapping boxes and low-level features are detected inside each box. Following this, the centroids of all the detected feature points in each box is tracked using the standard Kanade Lucas tracking algorithm. Fresh boxes are introduced periodically to track newly introduced objects.

II-B Feature Extraction

Most trajectory clustering and anomaly classifiers used a single feature descriptor for the task. We propose the use of multiple features, namely:

II-B1 Density

A trajectory can have varying densities around it, depending on the size of it’s neighbourhood. The density feature is thus computed using varying sizes of neighbourhood $\epsilon$ . We have considered three varying sizes as proposed by Sharma et al.[11].

[TABLE]

In this work, we are also interested in distances that describe the similarity of objects along time and therefore are computed by analysing the way distance between the objects varies over time. This gives us a measure of the spatio-temporal density in the most natural way possible:

[TABLE]

Where $d(\tau_{1}(t),\tau_{2}(t)$ represents the pairwise distance between two trajectories at the instant $t$ .

II-B2 Shape

All trajectory sketch a particular shape across the spatio-temporal scene, and this is represented as a polynomial function. The coefficients are calculated separately for the $x$ and $y$ coordinates yielding the $f_{s}$ feature vector.

[TABLE]

II-B3 Mean Position

It may be possible that trajectories separated over large distances may have similar velocities, directions and density features and consequently, get clustered in the same group. To avoid this, a location measure is needed as $\mathnormal{f_{l}=[mean_{x},mean_{y}]}$ .

II-B4 Standard Deviation

Standard Deviation is an extremely popular measure that quantifies the amount of variation or dispersion in a time-series data.

[TABLE]

The trajectories extracted from each surveillance video will give rise to a distinct feature-space for each of the features mentioned above. These distinct feature spaces will be used for identifying anomalies for that particular feature, and thereafter, the detection of overall anomalies.

III Clustering

Clustering methods have gained immense popularity as a data analysis tool ever since Clements[12] introduced it in 1954. It is observed that significantly dominant and usual features correspond to the denser regions of the probability density function of the data points. Using a Kernel Density Estimate, the modes of the probability density function can be found using either the Mean Shift[13, 14] or the Mountain method[15]. We would be using the Mean Shift method here as proposed by Fukunaga and Hostetler[13]. Moreover, since the anomaly detection algorithm proposed here revolves around clustering similar data points, the clustering algorithm used here has to be highly effective, as demonstrated by the Mean Shift Clustering algorithm.

III-A Mean Shift Clustering

It is a non-parametric and versatile, iterative algorithm with applications in varied fields like object tracking, texture segmentation and data mining. After learning estimate of the probability density of the data points using a Kernel Density Estimate, a gradient ascent procedure associates each data point with the nearby peak of the data-set’s density function. It defines a window around it and computes the mean of all the data-points within the window and shifts the centre of the window to the new mean until the process converges. When the process converges, we obtain the modes of the density estimate which serve as the centre-points of the clusters in the data. Suppose, there are $n$ data-points in the d-dimensional space $\mathcal{R}_{d}$ , then the density estimate with Kernel $K(x)$ and bandwidth $h$ , can be denoted as $f_{h,k}(x)$ . If we define $g(x)=-\acute{K}(x)$ as a shadow function[16] of $K(x)$ , with the assumption that the derivative of the kernel $K$ exists for all $x\in[0,\infty)$ , then the gradient of the density estimate can be written as:

[TABLE]

The modes of the density function are obtained among the zeros of the gradient of the density function. The first term in the product is proportional to the density estimate at $x$ computed with kernel $G$ , while the second term, or the mean shift is defined as the difference between the weighted mean and the centre of the Kernel window.

[TABLE]

It can be observed that the mean shift vector always points towards the direction of maximum increase in the density[17]. These obtained modes, or cluster centres, are found for each independent feature obtained, therefore, giving us a non-overlapping set of trajectories that are characteristic of the cluster they belong to.

IV Anomaly Detection

The entire crowd is often characterized by some dominant patterns, based on which, the entire set of trajectories is clustered. The anomalous trajectories, present throughout the crowded scene may belong to any one of these clusters but as a general property, will not have a substantial degree of belongingness to any of the clusters. The entire mechanism depends on two major tasks, as follows: Detecting Anomalies for each independent feature space followed by the selection of those trajectories that exhibit anomalous behaviour in most of the cases, using a voting mechanism.

Shannon Entropy has found widespread applications in numerous domains, with anomaly detection being one. The greatest advantage of this technique is that it allows the summarization of the feature distributions in the form of a single number. Our approach is based on the simple idea that an anomalous trajectory would exhibit higher levels of entropy when compared to normal trajectories. Instead of comparing the distances between the means of the cluster centres and trajectories as in previous work[6], we build a probability distribution using the distances between a trajectory and all of the cluster centres. The entropy of this probability distribution is evaluated and if it exceeds a threshold, it is classified as an anomaly. The threshold should be data adaptive and must adapt itself with the changing properties of the data.

$\mathnormal{C_{i}=[c_{1,i},c_{2,i},\ldots,c_{n,i}]}$

$\mathnormal{distvec_{j}=[distance(c_{1,i},f_{j}),\ldots,distance(c_{n,i},f_{j})]}$

$C_{i}$ represents the Cluster centres for a specific feature $i$ and the $distvec$ vector contains the distance measures between each of the cluster centre and trajectory $f_{j}$ . We further build the probability distribution $P_{j}=[p_{j,1},p_{j,2},\dots,p_{j,n}]$ where

[TABLE]

An entropy measure is computed for each trajectory:

[TABLE]

Trajectories with an entropy value exceeding that of a threshold are marked anomalous for that feature. In a crowded scene, the changes in it’s attributes occur randomly and most definitely. A particular section of the crowd can exhibit spatio-temporal changes in density and may also suddenly slow down or fasten up, thereby affecting individual feature parameters of the trajectories. Moreover, new trajectories that are introduced after a fixed interval of time may have similar features as a particular cluster but may exhibit one or more abnormalities due to it’s late introduction. Therefore, we cannot club all trajectories marked as anomalous from the above stated procedure as our desired set of abnormalities. A simple voting mechanism sieves out those trajectories that are marked anomalous for majority of the cases.

V Results

The method is tested on videos from two datasets, namely the Crowded scenes dataset used by Cheriyadat et al.[18] to detect dominant motions in crowds and the UCF crowd dataset, first used by Ali et al.[4]. To measure the efficiency of the method, we first identify all possible anomalous trajectories from the video and then, compare it with the classification test results. Since, the method involves the use of videos directly, we had to mark the anomalous trajectories in the actual video for the evaluation procedure. The results for three standard crowded videos from the mentioned datasets are tabulated as follows:

The results indicate that this method exhibits excellent Specificity, i.e. the probability of classifying a normal trajectory as anomalous is extremely low. However, improvement can be achieved in the Sensitivity of the approach by improving the True Positive rate. It is to be noted that the method indicates almost all anomalous trajectories in the expected regions of interest with commendable accuracy. The graphical plots reveal the effective nature of the results produced. The plots as depicted in Figure 3 are from the Crowded subway exit sequence. The trajectories have been detected from the entire video sequence and thereafter, clustering has been done on the several feature-spaces as shown in Figures 3(a),3(c),3(e) and 3(g). The anomalous trajectories detected in each such feature space has been plotted in Figures 3(b),3(d),3(f) and 3(h). Following the voting mechanism, the final anomalous trajectories have been displayed as red curves with their origin points shown as blue dots in Figure 3(i). Figure 3(j) shows the overall crowded scene as being composed of the anomalous trajectories shown in red and the normal trajectories shown in blue. If the video is analyzed properly, one can find that the trajectories responsible for slowing down the crowd exiting the subway are closely represented by the ones detected as anomalous by the algorithm. These are in essence, the peripheral trajectories present together with the principal crowd flow that has been represented closely by the blue section in Figure 3(j).

The method performs well when compared with different state of the art methods. The overall accuracy has been used as the metric for comparison here. The Information Bottleneck based approach[19] only extracts a speed based feature to improve the shape analysis of trajectory data. This method shows an accuracy of about 96% on their task-specific datasets. The other unsupervised methods, like the one based on hierarchical pattern discovery methods[20], although using a completely different approach; exhibit an accuracy of around 87%. Other abnormality methods that use the property of sparsity in abnormal events[21] exhibit an accuracy in the range of 88.71% to 96.7%.

VI Conclusion

This paper stresses on the need for understanding crowd dynamics better and presents an unsupervised mechanism to detect anomalous trajectories. The method is an application-ready one that itself generates trajectories from a video using a multi-object tracker and then cluster them based on multiple independent features. The use of multiple features for determining the clusters and the anomalies is based on the fact that an anomalous trajectory may posses similarity with a dominant pattern in one aspect, but differs significantly in a majority of aspects. A trajectory that may be similar to most trajectories in terms of mean location and position may cause disturbance in the scene due to its unnatural speed. This has been taken care of by using multiple features to detect the overall anomalies. The use of Shannon Entropy provides a novel approach to determine the anomalies, considering the fact that a probability distribution is developed using the distances from all cluster centres and not only the specific cluster with which the trajectory is associated. An anomalous trajectory is unlikely to belong to any specific cluster to a significant degree, thereby maximizing entropy in the probability distribution. The proposed approach yields excellent results on the chosen crowd videos. This work can be made efficient by developing a substantially large dataset that demarcates abnormal trajectories where the trajectories are represented as a time series as used here. Trajectory representation such as this has been used to evaluate crowd flow segmentation but here it has been put to use for abnormality detection. This lends this approach the added advantage of detecting the specific areas in the scene that contribute majorly to disturbance. Finally, this work may find extensive use in improving surveillance methods, better public space design, efficient event organization and possibly, even in tracking rogue naval and air routes. This work can be improved by making it real-time and also by generalizing the Entropy measure that could classify the anomalies optimally.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. Sulman, T. Sanocki, D. Goldgof, and R. Kasturi, “How effective is human video surveillance performance?” in Pattern Recognition, 2008. ICPR 2008. 19th International Conference on . IEEE, 2008, pp. 1–3.
2[2] J. Kim and K. Grauman, “Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on . IEEE, 2009, pp. 2921–2928.
3[3] Y. Cong, J. Yuan, and J. Liu, “Abnormal event detection in crowded scenes using sparse representation,” Pattern Recognition , vol. 46, no. 7, pp. 1851–1864, 2013.
4[4] S. Ali and M. Shah, “A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis,” in Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on . IEEE, 2007, pp. 1–6.
5[5] Z. Fu, W. Hu, and T. Tan, “Similarity based vehicle trajectory clustering and anomaly detection,” in Image Processing, 2005. ICIP 2005. IEEE International Conference on , vol. 2. IEEE, 2005, pp. II–602.
6[6] N. Anjum and A. Cavallaro, “Multifeature object trajectory clustering for video analysis,” IEEE Transactions on Circuits and Systems for Video Technology , vol. 18, no. 11, pp. 1555–1564, 2008.
7[7] G. Antonini and J.-P. Thiran, “Counting pedestrians in video sequences using trajectory clustering,” IEEE Transactions on Circuits and Systems for Video Technology , vol. 16, no. 8, pp. 1008–1020, 2006.
8[8] Q. Xu, Y. Liu, X. Li, Z. Yang, J. Wang, M. Sbert, and R. Scopigno, “Browsing and exploration of video sequences: A new scheme for key frame extraction and 3d visualization using entropy based jensen divergence,” Information Sciences , vol. 278, pp. 736–756, 2014.