Time series classification based on triadic time series motifs
Wen-Jie Xie, Rui-Qi Han, Wei-Xing Zhou

TL;DR
This paper introduces a novel triadic time series motif analysis method that effectively classifies various chaotic and real-world time series, outperforming traditional dynamic time warping in certain cases.
Contribution
It defines six types of triadic motifs and demonstrates their effectiveness in classifying diverse time series datasets with high accuracy.
Findings
Motif profiles can distinguish different chaotic systems.
The method outperforms dynamic time warping on some datasets.
Triadic motifs enhance time series classification accuracy.
Abstract
It is of great significance to identify the characteristics of time series to qualify their similarity. We define six types of triadic time-series motifs and investigate the motif occurrence profiles extracted from logistic map, chaotic logistic map, chaotic Henon map, chaotic Ikeda map, hyperchaotic generalized Henon map and hyperchaotic folded-tower map. Based on the similarity of motif profiles, we further propose to estimate the similarity coefficients between different time series and classify these time series with high accuracy. We further apply the motif analysis method to the UCR Time Series Classification Archive and provide evidence of good classification ability for some data sets. Our analysis shows that the proposed triadic time series motif analysis performs better than the classic dynamic time wrapping method in classifying time series for certain data sets investigated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Time series classification based on triadic time series motifs
Wen-Jie Xie
Rui-Qi Han
Wei-Xing Zhou
Department of Finance, East China University of Science and Technology, Shanghai 200237, China
Research Center for Econophysics, East China University of Science and Technology, Shanghai 200237, China
Department of Mathematics, East China University of Science and Technology, Shanghai 200237, China
Abstract
It is of great significance to identify the characteristics of time series to qualify their similarity. We define six types of triadic time-series motifs and investigate the motif occurrence profiles extracted from logistic map, chaotic logistic map, chaotic Henon map, chaotic Ikeda map, hyperchaotic generalized Henon map and hyperchaotic folded-tower map. Based on the similarity of motif profiles, we further propose to estimate the similarity coefficients between different time series and classify these time series with high accuracy. We further apply the motif analysis method to the UCR Time Series Classification Archive and provide evidence of good classification ability for some data sets. Our analysis shows that the proposed triadic time series motif analysis performs better than the classic dynamic time wrapping method in classifying time series for certain data sets investigated in this work.
keywords:
Time series analysis , Classification , Time series motifs , Motif profiles , Dynamic time wrapping
JEL: C1, P4, Z13
††journal: Elsevier
1 Introduction
Quantifying the similarity of time series has always been a very useful primitives for time series analysis, with applications to many fields (Hu et al., 2016; Silva et al., 2015; Gomes and Batista, 2015; Mueen et al., 2009; Mcgovern et al., 2011; Chiu et al., 2003; Mueen and Keogh, 2010; Tataw et al., 2013). The key point of measuring similarity is to define a suitable and effective distance between two time series (Hu et al., 2015; Tarango et al., 2014; Miśkiewicz and Ausloos, 2008). The widely adopted definitions of distance include the Euclidean distance and correlation measures (Miśkiewicz and Ausloos, 2008). However, in terms of measuring the similarity of time series, the Euclidean distance is often average, sometimes bad (Wang et al., 2013). For most time series analysis problems, the dynamic time warping (DTW) provides a highly competitive distance metric (Silva et al., 2018; Wang et al., 2013). To get the best performance of DTW, we need to regulate its unique parameter to optimize the dynamic time warping’s window width (Dau et al., 2018b). The complexity of the DTW method is relatively high, so many researchers provide some improved methods to have better performance (Petitjean et al., 2016; Dau et al., 2017; Mueen and Keogh, 2016). Moreover, practitioners generalize the DTW to some multi-dimensional time series classification experiments (Shokoohi-Yekta et al., 2017).
Similar subsequences in time series can be defined as time series motifs, which characterize the temporal properties and dynamics of the corresponding long time series (Mcgovern et al., 2011; Chiu et al., 2003; Mueen and Keogh, 2010). It is useful for exploratory data mining and often used as inputs for classification of time series, clustering, segmentation (Bagnall et al., 2017; Mori et al., 2017; Dau et al., 2016; Petitjean et al., 2014). Time series motif analysis has been widely used in diverse fields (Yeh et al., 2018; Zhu et al., 2018; Linardi et al., 2018a, b; Yeh et al., 2017; Zakaria et al., 2016). Gomes and Batista presented a SAX-based motif discovery method to classify the urban sound (Gomes and Batista, 2015). Wang et al. proposed a method to automatically detect repeating segments in music and two time series data sets (Wang et al., 2010). Son and Anh introduced two novel methods to discover approximate -motifs in time series data (Son and Anh, 2016) and their methods play an important role in several time series data mining tasks by using motif discovery. Lots of researchers have used time series motifs analysis for applications in many different domains (Mueen et al., 2009; Mcgovern et al., 2011; Chiu et al., 2003; Mueen and Keogh, 2010).
Triadic time series motifs (Xie et al., 2019b) are inspired by the network motifs in visibility graph (Lacasa et al., 2009, 2008; Ni et al., 2009; Yang et al., 2009; Elsner et al., 2009; Qian et al., 2010) and horizontal visibility graphs (HVG) mapping from time series (Lacasa et al., 2009; Elsner et al., 2009; Lacasa and Toral, 2010; Shao, 2010; Dong and Li, 2010; Ahmadlou et al., 2010; Tang et al., 2010; Xie et al., 2017, 2019a). The six triadic time series motifs are similar in some features with sequential HVG motifs (Iacovacci and Lacasa, 2016b, a) and ordinal patterns (Keller and Sinn, 2005; McCullough et al., 2015, 2017; Zhang et al., 2017). The permutation entropy based on ordinal patterns (Bandt and Pompe, 2002; Amigó, 2010) is a natural complexity measure and useful in the presence of dynamical or observational noise. Similarly, the triadic time series motif analysis can also mine the dynamical characteristics of time series from complex system. Xie et al. (2019b) used the triadic time series motif analysis to uncover the different dynamics in the heartbeat rates of healthy subjects, congestive heart failure subjects, and atrial fibrillation subjects and identify the bullish and bearish markets from the price fluctuations of financial markets.
In this work, we identify six triadic time series motifs and investigate their occurrence profiles in time series from logistic maps with different control paratemers and chaotic time series generated from chaotic logistic map, chaotic Henon map, chaotic Ikeda map, hyperchaotic generalized Henon map and hyperchaotic folded-tower map. It is of great significance to be able to discover the characteristics of time series from different types of chaotic maps. We also apply the triadic time series motif analysis to classify the time series in 128 data sets from UCR Time Series Classification Archive (Dau et al., 2018a).
2 Triadic time series motifs
Triadic time series motifs are determined by the relative magnitude and ordinal order of three data points that are randomly chosen from the time series Xie et al. (2019b). For three arbitrary data with in the time series , a time series motif forms if the following conditions is fulfilled (Xie et al., 2019b):
[TABLE]
We obtain six triadic time series motifs, which are denoted as , , , , , in Fig. 1. This definition does not consider situations where two or three data points of are equal. When two data points are identical, we treat it as if the latter data point is larger than the former one.
The time series motifs are different from the conventional motifs of horizontal visibility graphs (Lacasa and Toral, 2010; Lacasa et al., 2009; Shao, 2010; Dong and Li, 2010; Ahmadlou et al., 2010; Elsner et al., 2009; Tang et al., 2010; Xie and Zhou, 2011; Xie et al., 2017, 2019a). Considering the triadic HVG motif, there are only two admissible motifs in undirected HVGs, one being a chain and the other being a triangle. As shown in Fig. 1, the open triadic motif can be mapped from the time series , , and and the close triadic motif can be mapped from the time series and . Time series motifs consider not only the visibility between data points, as HVG motifs, but also the order and relative magnitudes of the points. Hence, time series motifs explore finer structures of HVG motifs (Xie et al., 2019b).
3 Triadic time series motif analysis of chaotic maps
3.1 Chaotic maps
We perform triadic time series motif analysis numerically for different time series in continuous and discrete dynamic systems. Through extensive numerical experiments, we investigate the motif distribution extracted from the logistic map, the chaotic logistic map, the chaotic Henon map, the chaotic Ikeda map, the hyperchaotic generalized Henon map, and the hyperchaotic folded-tower map.
The logistic map is a representative example of how complex, chaotic behaviour can arise from very simple nonlinear dynamical equation. Mathematically, the logistic map is written as
[TABLE]
To distinguish chaotic maps and hyperchaotic maps, we generate four types of time series from chaotic Henon map, chaotic Ikeda map, hyperchaotic generalized Henon map and hyperchaotic folded-tower map. The specific equations for these four types of dynamic systems are given below (Xu et al., 2008). Mathematically, the chaotic Henon map is written as
[TABLE]
where and . The chaotic Ikeda map is written as
[TABLE]
where and . The hyperchaotic generalized Henon map is written as
[TABLE]
where and . The hyperchaotic folded-tower map is written as
[TABLE]
where and .
3.2 Occurrence frequency distributions of triadic motifs
We first generate time series by using the logistic map with control parameter . The parameter ranges in the interval of . When , will approach permanent oscillations between two values from almost all initial conditions. When , will approach permanent oscillations among four values from almost all initial conditions. When , , or , exhibits chaotic behaviour. For each parameter , we generate 200 time series with length , determine the occurrence frequencies of the six triadic time series motifs, and obtain the occurrence frequency distribution of each motif. Fig. 2 shows the distributions of the occurrence frequency of the motif in the logistic time series with , , , , and . We find that the five classes of time series have very different occurrence frequency distributions of time series motifs.
When the parameter , the time series is . Without loss of generality, we assume that . The set of motif is the union of and with , so that the occurrence count of is
[TABLE]
The sets of motifs , and are respectively the union of , of , and of , where . It follows that
[TABLE]
By definition, motifs and cannot appear. Therefore, the occurrence frequencies
[TABLE]
are obtained as follows
[TABLE]
This analytical result is verified by the numerical simulations, as shown in Fig. 2 (red bars).
When the parameter , the time series is slightly more complicated. As shown in Fig. 2, for the same motif, the distribution of occurrence frequency of the motif is concentrated and the variance is small, even 0. When , the oscillation period becomes longer and longer, until about , the period tends to infinity, and the system becomes a chaotic system. When , the result of the iterative run will switch between the period type and the chaotic type. Until , the system is complete chaos. Although they are all chaos, the distribution of occurrence frequency for in Fig. 2 has a big difference.
We further perform triadic time series motif analysis of the four types of discrete chaotic time series: chaotic Henon map, chaotic Ikeda map, hyperchaotic generalized Henon map and hyperchaotic folded-tower map. In Fig. 3, the length of time series is 512 and there are big difference in the occurrence frequency of motif and motif between the four types of discrete chaotic time series. It can be imagined that the longer the time series is, the larger the difference of the occurrence frequency of the individual motifs will be, and the easier it is to distinguish the the four types of discrete chaotic time series. In Fig. 3, we cannot use one motif’s occurrence frequency to distinguish different types of chaotic time series, but in Fig. 2, we can use a single indicator to distinguish the five types of logistic time series. This method can be understood as a dimension reduction method, which reduces the time series with length to 5-dimensional space for different time series, because .
3.3 Classification of time series
The triadic motif analysis is applied to the classification of time series to investigate the effectiveness of the similarity measure of time series. In order to classify different time series, we need to extract the features of time series. The triadic time series motifs are used as the features of time series, and then the time series are classified based on the motif occurrence frequency distributions. From a common sense, the longer the time series, the more information obtained by the method for extracting features of time series, the more accurately the time series can be classified. Therefore, we consider the influence of time series lengths on the accuracy of classification. We compare two classical methods for measuring the similarity of time series: one is the simple Euclidean distance method, and the other is the dynamic time warping method (DTW). We select the nearest neighbor method (1NN) to classify the time series based on the three similarity measures.
We analyze respectively the time series generated from the logistic map with parameter , , , and and from the four chaotic maps. For each type of time series, we generate 100 time series as the training set and 100 time series as the test set. To analyze the accuracy of classification of time series with different lengths , the length of time series is changed from to , and then the data sets are classified by the nearest neighbor method based on three similarity measures. The three colors (red, green, blue) in Fig. 4 (A) and (C) correspond to the three similarity measures: the motif occurrence profile, the DTW and the Euclidean distance. The red dot indicates the accuracy of classification of the 1NN method based on the motif occurrence profile. The green dot indicates the accuracy of classification of the 1NN method based on the DTW. The blue dot indicates the accuracy based on the Euclidean distance. The ordinate represents the discriminant correctness rate based on the training set and the test set. The abscissa represents the time series length .
In general, the DTW-based discriminant accuracy is the best and the motif profile method performs slightly worse, especially when the time series length is less than 200. The Euclidean distance method is the worst. Usually, the longer is the time series, the more information is extracted by the methods. However, the accuracy of the classification method based on Euclidean distance decreases with the increase of the time series length. This is mainly because that the Euclidean distance calculation is simple. When the time series is longer, there is more noise, which is not conducive to depicting the similarity between time series. The Euclidean distance method has a relatively good effect when the time series length is less than 50. When the time series length is greater than 200, we find that and , indicating that the DTW-based method and the motif profile method provided in this paper are able to distinguish completely different time series. The Euclidean distance method is not good and it is the same as the random classification, from which the accuracy is , where is the number of categories in the data set. The logistic map series has 5 categories, we have . The discrete chaotic time series has 4 categories, we have .
In order to analyze the accuracy of classification in the case of data loss, we perform the same analysis on the time series after data deletion. The length of the original time series is . We randomly delete a proportion () of the data from each time series. We then classify the remaining data and calculate the classification accuracy rates. This process is repeated 10 times and the average classification accuracy rates , and are obtained. Fig. 4 (B) and (D) show the relationship between the data deletion rate and the classification accuracy rates , and . Overall, , and decrease with increasing . We observe that the Euclidean distance method performs as the random classification, with for the logistic maps and for the chaotic time series. For the logistic maps, the motif profile method is more robust to data deletion than the DTW method. When the data deletion rate is close to 20%, the motif-based classification accuracy can still reach 100%, while the DTW-based classification accuracy rate drops to about 80%. In contrast, for the chaotic time series, the DTW method outperforms the motif profile method. The DTW classification method is very robust to data deletion and its accuracy rate is close to 100% even when the data deletion rate is as high as 50%. Not surprisingly, each method (DTW or motif profile) has its own advantages and disadvantages. Different methods usually have different performances when they are applied to different time series.
4 Triadic time series motif analysis of the UCR Time Series Classification Archive
To test the effectiveness of this method on similarity measures of time series, we use this method to classify real time series. The data source is from the UCR Time Series Classification Archive (Dau et al., 2018a). The UCR Time Series Classification Archive contains 128 data sets, each of which is divided into a training set and a test set. The dataset website also presents some results about classification accuracy of three methods. The first method uses the nearest neighbor method to classify based on the Euclidean distance. The accuracy rate is expressed by , where the highest rate is 100% (Coffee) and the lowest correct rate is 5.77% (PigAirwayPressure). The second uses the nearest neighbor method to classify based on the DTW method. The correct rate is represented by , where the highest is 100% (Trace, Two Patterns, Plane, Coffee) and the lowest correct rate is 10.58% (PigAirwayPressure). The third method is based on the improvement of DTW. From previous research results, we found that the DTW method can describe the similarity of time series very well, which is more effective than the Euclidean distance, but the complexity of the DTW method is too high. We apply our method to the 128 data sets and compared the results with those obtained from the first and second methods.
Fig. 5 shows the radar charts of the motif occurrence profiles averaged within different classes of time series in six representative data sets. Each radar chart corresponds to a data set. Each solid line in the radar map represents the average of the motif occurrence profile of one category of time series in the data set. The time series belonging to the same class in the training set and the test set are included in the averaging process. It can be seen that the six radar charts are very different, implying that the motif profile method can classify different data sets effectively. For each data set, the difference between the profile lines in the corresponding radar chart represents the difference between different time series. The classification will be more accurate if the difference is larger. The three radar charts on the top panel of Fig. 5 have many profile lines that are not sufficiently separated, which indicates that it would be hard to distinguish those categories. In contrast, each of the three radar charts on the bottom panel of Fig. 5 have only two profile lines that are well separated, which indicates that the two categories can be well distinguished. Indeed, the classification accuracy is low for the former data sets and high for the later data sets (see also Fig. 6).
We use the triadic motif occurrence profile as the characteristic time series feature to classify the 128 data sets. The classification accuracy is shown in Fig. 6. For 11 data sets, our method is better than the DTW method, since . The 11 data sets is shown in the lower right triangle of Fig. 6. Fig. 6 also compares the classification accuracy of the motif profile method and the Euclidean distance method. There are 18 data sets satisfying , indicating that our method performs better than the Euclidean distance method for these data sets. For instance, for the data set SmallKitchenAppliances, our method is 50% more accurate than the Euclidean distance method. In general, the DTW method does a very good job in the measurement of time series similarity. Our method is superior to the DTW method for some data sets.
5 Conclusions
It is of great significance to be able to discover the characteristics of time series from a unique perspective through novel methods. Here, we studied the characteristics of time series through triadic time series motifs. We defined six different network motif. The simulation analysis finds that the distributions of the motif occurrence frequencies corresponding to logistic maps and chaotic time series (chaotic Henon map, chaotic Ikeda map, hyperchaotic generalized Henon map and hyperchaotic folded-tower map) all have their own characteristics. The motif occurrence profiles can quantify the time series characteristics in different dynamical systems and show comparative classification power as the DTW method.
We apply the motif analysis to the UCR data sets. The advantage of the Euclidean distance method is that the calculation is simple and fast. The DTW method performs best, but in some data sets, the performance is not as good as the motif profile method. Our method has better accuracy than the DTW method for 11 data sets.
The starting point of our method is completely different from the Euclidean distance method and the DTW method. This study is based on the complex networks, and mines the features in the time series. It is expected to be effectively improved in future research and provide a more effective method for measuring the similarity of time series. Indeed, there are many methods for extracting motifs from time series. Different motif extraction methods can describe different time series features. In order to improve the practicality of our method, we will develop different motif recognition methods to measure time series similarity.
Acknowledgements
This work was supported by National Natural Science Foundation of China (11505063, 71532009, U1811462) and Fundamental Research Funds for the Central Universities (222201818006).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ahmadlou et al. (2010) Ahmadlou, M., Adeli, H., Adeli, A., 2010. New diagnostic EEG markers of the Alzheimer’s disease using visibility graph. J. Neural Transm. 117, 1099–1109.
- 2Amigó (2010) Amigó, J., 2010. Permutation Complexity in Dynamical Systems. Springer-Verlag Berlin Heidelberg.
- 3Bagnall et al. (2017) Bagnall, A. J., Lines, J., Bostrom, A., Large, J., Keogh, E. J., 2017. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Discov. 31 (3), 606–660.
- 4Bandt and Pompe (2002) Bandt, C., Pompe, B., 2002. Permutation entropy: a natural complexity measure for time series. Phys. Rev. Lett. 88 (17), 174102.
- 5Chiu et al. (2003) Chiu, B., Keogh, E., Lonardi, S., 2003. Probabilistic discovery of time series motifs. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 493–498.
- 6Dau et al. (2016) Dau, H. A., Begum, N., Keogh, E. J., 2016. Semi-supervision dramatically improves time series clustering under dynamic time warping. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016. pp. 999–1008.
- 7Dau et al. (2018 a) Dau, H. A., Keogh, E., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., Ratanamahatana, C. A., Yanping, Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., October 2018 a. The UCR Time Series Classification Archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ .
- 8Dau et al. (2017) Dau, H. A., Silva, D. F., Petitjean, F., Forestier, G., Bagnall, A. J., Keogh, E. J., 2017. Judicious setting of dynamic time warping’s window width allows more accurate classification of time series. In: 2017 IEEE International Conference on Big Data, Big Data 2017, Boston,MA, USA, December 11-14, 2017. pp. 917–922.
