A performance study of anomaly detection using entropy method
A.A. Waskita, H. Suhartanto, L.T. Handoko

TL;DR
This study evaluates the entropy method for anomaly detection in sensor networks, demonstrating its superior ability to identify outliers compared to elliptical methods, especially when sensor data are uncorrelated.
Contribution
The paper provides an empirical comparison showing that the entropy method outperforms elliptical methods in detecting anomalies in uncorrelated sensor data.
Findings
Entropy method detects more outliers than elliptical method.
Entropy approach performs well with uncorrelated sensor data.
Sensor independence is crucial for the effectiveness of the entropy method.
Abstract
An experiment to study the entropy method for an anomaly detection system has been performed. The study has been conducted using real data generated from the distributed sensor networks at the Intel Berkeley Research Laboratory. The experimental results were compared with the elliptical method and has been analyzed in two dimensional data sets acquired from temperature and humidity sensors across 52 micro controllers. Using the binary classification to determine the upper and lower boundaries for each series of sensors, it has been shown that the entropy method are able to detect more number of out ranging sensor nodes than the elliptical methods. It can be argued that the better result was mainly due to the lack of elliptical approach which is requiring certain correlation between two sensor series, while in the entropy approach each sensor series is treated independently. This is very…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A performance study of anomaly detection using entropy method
A.A. Waskita12, H. Suhartanto2, L.T. Handoko45
1Center for Technology and Safety of Nuclear Reactor, National Nuclear Energy Agency,
Kawasan Puspiptek Serpong, Tangerang 15310, Indonesia
Email : [email protected]
2Faculty of Computer Science, University of Indonesia,
Kampus UI Depok, Depok 16424, Indonesia
Email : [email protected]
4Group for Theoretical and Computational Physics, Research Center for Physics, Indonesian Institute of Sciences,
Kawasan Puspiptek Serpong, Tangerang 15310, Indonesia
Email: [email protected]
5Department of Physics, University of Indonesia,
Kampus UI Depok, Depok 16424, Indonesia
Email: [email protected]
Abstract
An experiment to study the entropy method for an anomaly detection system has been performed. The study has been conducted using real data generated from the distributed sensor networks at the Intel Berkeley Research Laboratory. The experimental results were compared with the elliptical method and has been analyzed in two dimensional data sets acquired from temperature and humidity sensors across 52 micro controllers. Using the binary classification to determine the upper and lower boundaries for each series of sensors, it has been shown that the entropy method are able to detect more number of out ranging sensor nodes than the elliptical methods. It can be argued that the better result was mainly due to the lack of elliptical approach which is requiring certain correlation between two sensor series, while in the entropy approach each sensor series is treated independently. This is very important in the current case where both sensor series are not correlated each other.
Index Terms:
anomaly detection, elliptical method, entropy method
I Introduction
Detecting anomaly, especially in a safety critical system is very important to mitigate any system failures in the near future [1]. In some systems, such failures could lead to the tremendous environmental disasters. Therefore, those systems are always equipped with robust monitoring system based on the eiher wireless or wired sensor network (WSN). The network should involve various types and ranges of sensors which transmit the acquired data to the central unit. In some cases, the sensors are embedded in the nearby cascade controller prior to the main unit relatively far away from the monitoring node in the field. In a more complex system, it could consist of several tiers from the main system till the end monitoring nodes.
Some examples of the systems with tremendous environmental influences are the so-called landslide early warning system (LEWS) involving micro-electromechanical system (MEMS) based sensors, fiber optic strain sensing and GPS tracking system to monitor the ground motion related to earthquakes or volcanic activities [2, 3]; the forest fire detection and monitoring system [4].
Some techniques to detect the anomalies have originally been developed for cyber security, in particular to mitigate the cyber attacks. For instance, the intrusion detection system (IDS) or intrusion prevention system (IPS) was worked out by [5, 6, 7, 8]. In term of cyber security, those methods are complement to the signature approaches. It should be noted that the signature based IDS performs better in detecting the well known patterns of intrusion, while the anomaly based ones suits for the unknown patterns [9].
In contrast with the unpredictable ”pattern” in cyber attacks which require the training procedure to define the ”normal” patterns, in most cases any systems under monitoring through WSN have been constructed based on the pre-defined rules or model with certain parameter sets. These parameter sets consequently govern the allowed ranges of all sensor nodes within the system. However, the model is perfect under a presumption that all sensor and controlling nodes are working well without any failures. Concerning any potential failures during the operation, it is considerable to put the so-called early anomaly detection system (EADS) prior to the main processing unit. The EADS should lightweight, and not overburden the whole system. It is not necessarily accurate, but it should be able to provide, at least preliminary, information of any partial failures in advance. Actually in our previous works, the anomaly detection system has also been investigated using the statistical approaches. In the approaches a kind of interactions among single or cluster sensor nodes within the system has been modeled through the weighted ”relationships” among the nodes [11, 12]. Unfortunately, the model is quite exhaustive and requires huge computing power.
From this point of view, some approaches based on the ”previous” pattern as adopted in the cyber security might not be appropriate. It would be better to set up more deterministic approaches like the entropy method [10]. The paper attempts to apply the entropy based method for the EADS in sensor network. The sensor nodes could be homogeneous or hybrid with various characteristics without assuming any interactions among them. The method only measures the level of irregularities in the system based on the predefined allowed ranges of each node following its specifications. The irregularities at certain degrees within a cluster or the whole system are interpreted as anomalies. As already argued in some other previous works [13] and references therein, the entropy based method requires light computing power and fast enough for anomaly detection. These natures are suitable for our purpose in the present case.
The paper is organized as follows. After this section, the entropy method is briefly explained in Sec. II. Sec. III deals with the experiment using the real data set, and followed with discussion on the comparison with the elliptical methods in the previous work by Rajasegarar et.al.. The paper is ended with the summary.
II Entropy method
Following the seminal work of Shannon [14], the entropy is defined as the level of irregulaties occur, or in another word a measure of disorder in a system under consideration. It can be calculated using the master formula [10],
[TABLE]
where,
[TABLE]
is the elements of probability of . is the elements of accumulated state set, with , and it is composed of all non-repetitive states in . On the other hand, is the elements of which is representing the number of repetitions of .
Now, these procedure can be applied to investigate the real data and to perform a comparison with another methods.
III Experiment
The experiment was conducted by taking the real data from the Intel Berkeley Research Laboratory (IBRL) data set during the acquisition period of March 1st, 2004 from 00:00 to 03:59. This period was choosen following the work by Rajasegarar et.al [15, 16]. Only temperature and humidity sensors were taken into consideration over 54 MICA2DOT microprocessors, where each of them actually consisted of 4 sensor nodes: temperature, humidity, light and voltage.
As already mentioned in the preceeding section, the entropy method itself does not requires the correlation between temperature and humidity sensors. Therefore, one can determine independently the normal ranges for each sensor series using its manufacture specifications, and also took into account the fact that all data from the 37th node and a part of the 14th node were considered anomaly. Hence, the normal boundary condition ’s in the current experiment were set to be the following,
- •
Temperature: C C
- •
Humidity:
Further, each sensor series was divided into a smaller interval of time, namely 10 minutes, to have a set of data being calculated using Eqs. (1) and (2). This procedure was taken to enable the entropy analysis at the node level. Further calculation is illustrated in Fig. 1. Each acquisition in the 10 minutes interval should be evaluated against the normal boundary condition to determine whether the data acquisition is normal or not. The combination of normal-abnormal of the 10 minutes interval data acquisition establish a cummulative value.
The following is the illustration for determining the entropy from the experiment in Fig. 1. For the first 10 minutes interval of the figure, there are 7 data acquisition for temperature and humidity parameters. At this interval, all temperature obtained exceeds the normal boundary. Based on the [10], the value of for all of them is and construct the cummulative value array of as . With different cummulative value consist of single element, the accumulated state array of is . Then, each of the element has the probability value of . This leads to . On the other hand, all of the first humidity acquisition data from Fig. 1 are inside the boundary then the value of for all of them is [math] and construct the cummulative value array of as . There is only one cummulative value that produce the probability value into . This leads to .
The algorithm 1 describes the step-by-step of the procedure illustrated in Fig. 1
One should note that the 5th and 15th nodes were discarded since the data were missing in the data set. The calculated results for each sensor series are plotted in Fig. 2.
In the next section, the result is be compared with another methods done in some previous works.
IV Discussion and summary
The result using entropy method in Fig. 2 shows the anomalies scattered over the area, while the normal data are on both horizontal and vertical axis. In particular, one should notice that all data coming from the 37th node (green triangulars) are completely outranged, while only partial part of the 14th node (red triangulars) are recognized as anomalies as expected. On the other hand, the entropy has successfully detected the data anomalies coming from various nodes.
One can compare the current result with the previous ones done by Rajasegarar et.al [16] using the elliptical method. The method is based on the elliptical curves to determine the ”normal” region over the data as illustrated in Fig. 3. In the figure, the CL (confidence level) curve is shown by the blue curve generated from the correlation between the data from temperature and humidity sensor nodes. Then, one can easily describe more curves with lower CLs to exclude the should-be outlier data. However it is not trivial to fit the curve to accomodate the allowed and anomalous regions.
More detailed study was conducted in the paper by Moshtaghi et.al. using the fractional elliptical method. It also dealt with the same data set, but different period of time, to accomodate the inaccuracies in Rajasegarar et.al. [17]. The fractional elliptical method is able to detect better the outlier data in between the curves. Unfortunately we cannot provide the one by one direct comparison due to the different period of data set.
Finally, the present paper has shown the result of EADS using the entropy method, and its comparison with the previous results using the elliptical method. The comparison has been conducted in two dimensional space based on the entropies calculated from the data series of temperature and humidity nodes. Each value of entropies have been calculated using the data set of 10 minutes interval along the whole period under consideration. It is argued that the entropy method is able to detect the scattered anomalies across the space, regardless its pattern in contrast with, for instance, the elliptical method.
Acknowledgments
AAW thanks the Indonesian Ministry of Research and Technology for financial support, and the Group for Theoretical and Computational Physics, Research Center for Physics LIPI for warm hospitality during the work. LTH thanks to the Abdus Salam ICTP for hospitality when the initial part of this work was done. LTH is funded by Riset Unggulan LIPI in fiscal year 2016 under Contract no. 11.04/SK/KPPI/II/2016.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] T. Ahram, W. Karwowski, D. Schmorrow, R. L. Boring, K. D. Thomas, T. A. Ulrich, and R. T. Lew, “6th international conference on applied human factors and ergonomics (ahfe 2015) and the affiliated conferences, ahfe 2015 computerized operator support systems to aid decision making in nuclear power plants,” Procedia Manufacturing , vol. 3, pp. 5261 – 5268, 2015. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S 2351978915006058
- 2[2] D. Hanto, B. Widiyatmoko, B. Hermanto, P. Puranto, and L. T. Handoko, “Real-time inclinometer using accelerometer MEMS,” in Proceeding of the International Conference on Physics and Its Applications for Environmentally Friendly Technology and Disaster Management , 2010.
- 3[3] R. W. R. Tu, T. R. W. M. Ge, M. Ramatschi, C. Milkereit, D. Bindi, and T. Dahm, “Cost-effective monitoring of ground motion related to earthquakes, landslides, or volcanic activity by joint use of a single-frequency GPS and a MEMS accelerometer,” Geophysical Research Letters , vol. 40, pp. 3825–3829, 2013.
- 4[4] Y. E. Aslan, I. Korpeoglu, and Ö. Ulusoy, “A framework for use of wireless sensor networks in forest fire detection and monitoring,” Computers, Environment and Urban Systems , vol. 36, pp. 614–625, 2012.
- 5[5] C. Kavitha and M. Suresh, “Processing Massive Data Streams to Achieve Anomaly Intrusion Prevention,” in Fourth International Conference on Computational Intelligence and Communication Networks , 2012, pp. 948–952.
- 6[6] A. G. Fragkiadakis, V. A. Siris, N. E. Petroulakis, and A. P. Traganitis, “Anomaly-based intrusion detection of jamming attacks, local versus collaborative detection,” Wireless Communications and Mobile Computing , vol. 15, no. 2, pp. 276–294, 2015.
- 7[7] S.-W. Lin, K.-C. Ying, C.-Y. Lee, and Z.-J. Lee, “An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection,” Applied Soft Computing , vol. 12, no. 10, pp. 3285–3290, 2012.
- 8[8] W. Xiong, H. Hu, N. Xiong, L. T. Yang, W.-C. Peng, X. Wang, and Y. Qu, “Anomaly secure detection methods by analyzing dynamic characteristics of the network traffic in cloud communications,” Information Sciences , vol. 258, pp. 403–415, 2014.
