Trajectory Functional Boxplots
Zonghui Yao, Wenlin Dai, and Marc G. Genton

TL;DR
This paper introduces novel visualization tools for trajectory functional data, enabling effective detection of shape and magnitude outliers, validated on hurricane and bird migration data.
Contribution
It proposes the trajectory functional boxplot and MSBD-WO plot, new tools for visualizing and detecting outliers in trajectory functional data.
Findings
The WO index effectively detects shape outliers.
MSBD provides a ranking for magnitude outliers.
Tools successfully applied to hurricane and bird migration data.
Abstract
With the development of data-monitoring techniques in various fields of science, multivariate functional data are often observed. Consequently, an increasing number of methods have appeared to extend the general summary statistics of multivariate functional data. However, trajectory functional data, as an important sub-type, have not been studied very well. This article proposes two informative exploratory tools, the trajectory functional boxplot, and the modified simplicial band depth (MSBD) versus Wiggliness of Directional Outlyingness (WO) plot, to visualize the centrality of trajectory functional data. The newly defined WO index effectively measures the shape variation of curves and hence serves as a detector for shape outliers; additionally, MSBD provides a center-outward ranking result and works as a detector for magnitude outliers. Using the two measures, the functional boxplot…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 8
Figure 3
Figure 9
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21| Model 1 | Model 2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| RMD | 1 | 0 | 0.11 | 0.003 | 1 | 0 | 0.28 | 0.003 | |
| MSBD | 0.67 | 0.02 | 0 | 0 | 0.25 | 0.02 | 0 | 0 | |
| WO | 1 | 0 | 0 | 0 | 1 | 0 | 0.12 | 0.006 | |
| Model 3 | Model 4 | ||||||||
| RMD | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | |
| MSBD | 0.75 | 0.04 | 0.4 | 0 | 1 | 0 | 0.4 | 0 | |
| WO | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Applications · Morphological variations and asymmetry
\authormark
ZONGHUI YAO et al.
\corres
*Wenlin Dai, Institute of Statistics and Big Data, Renmin University of China, Beijing, China.
Trajectory Functional Boxplots
Zonghui Yao
Wenlin Dai
Marc G. Genton
\orgdivStatistics Program, \orgnameKing Abdullah University of Science and Technology, \orgaddress\stateThuwal 23955–6900, \countrySaudi Arabia
\orgdivInstitute of Statistics and Big Data, \orgnameRenmin University of China, \orgaddress\stateBeijing, \countryChina
(26 April 2019; 6 June 2019; 6 June 2019)
Abstract
[Summary]With the development of data-monitoring techniques in various fields of science, multivariate functional data are often observed. Consequently, an increasing number of methods have appeared to extend the general summary statistics of multivariate functional data. However, trajectory functional data, as an important sub-type, have not been studied very well. This article proposes two informative exploratory tools, the trajectory functional boxplot, and the modified simplicial band depth (MSBD) versus Wiggliness of Directional Outlyingness (WO) plot, to visualize the centrality of trajectory functional data. The newly defined WO index effectively measures the shape variation of curves and hence serves as a detector for shape outliers; additionally, MSBD provides a center-outward ranking and works as a detector for magnitude outliers. Using these two measures, the functional boxplot of the trajectory reveals center-outward patterns and potential outliers using the raw curves, whereas the MSBD-WO plot illustrates such patterns and outliers in a space spanned by MSBD and WO. The proposed methods are validated on hurricane path data and migration trace data recorded from two types of birds.
keywords:
Data visualization, Depth, Magnitude and shape, Multivariate functional data, Ranking, Outlier detection
††articletype: Article Type
1 Introduction
Due to the rapid progress in data-monitoring techniques and the Internet, the volume of data has experienced explosive growth. Functional data are commonly recorded among various fields, including, but not limited to, medical imaging, meteorology, biology, and engineering. Examples include temperature and precipitation records at weather stations, hand-writing data in different languages, and absorption curves of some medical ingredients. Responses at points of observation are categorized as univariate or multivariate functional data. Functional data analysis has attracted great attention over the last two decades 23, 9, 15; we refer the readers to Wang \BOthers. 29 for a recent review. Most research focuses on the univariate cases, leaving the multivariate cases less explored.
Here, we focus on trajectory data, an important type of multivariate functional data. Trajectory data usually record the positions of objects during a specific time window and commonly appear in many important research areas. We provide three examples in Figure 1 that include the hurricane paths from a predictive model 4 and the migration paths of two types of birds 8, 26. We propose to develop some tools for exploratory analysis, specifically for this type of data.
During the boom of functional data analysis, many summary statistics and inference techniques have been generalized from traditional to functional data. However, trajectory data have not been sufficiently investigated, and the corresponding ranking methods, outlier detections, and visualizations remain open questions. Most existing exploratory analysis methods for functional data are based on the concept of statistical depth, which is initially a potent tool to rank multivariate data but also does well in describing the centrality for functional data. Several depth notions have been proposed to rank multivariate functional data, e.g., weighted modified band depth (WMBD; Ieva \BBA Paganoni, \APACyear2013), simplicial band depth and modified simplicial band depth (SBD and MSBD; López-Pintado \BOthers., \APACyear2014); they are the prevailing methods to give a plausible center-outward sequence. 6 introduced the directional outlyingness for detecting outliers from multivariate functional data.
Outlier detection is another crucial step in the analysis of data. The well-known types of functional outliers include persistent outliers, isolated outliers, magnitude outliers, and shape outliers (Hubert \BOthers., \APACyear2015). The first three types of outliers can be handled by the simplicial band depth. However, shape outlier detection is a more challenging task. Shape outliers are defined as trajectories exhibiting a different shape from the rest of the sample. The outliergram (Arribas-Gil \BBA Romo, \APACyear2014) is one choice for shape outlier detection, based on the modified epigraph index and the modified band depth, but they only show its capacity in the univariate case. 6 combined the magnitude and shape outlyingness through forming vectors of the mean of directional outlyingness (MO) and variance of directional outlyingness (VO), then calculated their Robust Mahalanobis Distance (RMD) with the minimum covariance determinant estimator of 24. They defined the outliers as those for which RMD values are beyond a specific threshold. However, this method cannot detect the two types of outliers, shape and magnitude, separately. Thus, it leads to large false detection rates.
Visualization tools are commonly used to illustrate the properties of the analyzed data. For functional data, various tools have been developed, such as functional bagplots and functional highest density region plots (Hyndman \BBA Shang, \APACyear2010), functional boxplots (Sun \BBA Genton, \APACyear2011), and surface boxplots (Genton \BOthers., \APACyear2014). These plots give a good description of the functional data and show each curve directly with different labels. Another type of plots is based on the magnitude versus shape index of each curve, showing the centrality of data by scatterplots. Outliergrams 2, functional outlier maps (Rousseeuw \BOthers., \APACyear2018), and magnitude-shape plots (Dai \BBA Genton, \APACyear2018) are some examples. Yet, a good visualization tool for trajectory functional data is lacking.
In this paper, we propose two visualization tools for trajectory functional data analysis. Specifically, we develop the “Wiggliness of Directional Outlyingness" (WO), which performs very well in detecting shape outliers in trajectory functional data. Based on the results, we first construct a trajectory functional boxplot, that visualizes the raw curves with different percentage bands and outliers; we then provide another scatterplot, the MSBD-WO plot, presenting the magnitude and shape properties for each curve.
The remaining of the paper is organized as follows: Section 2 introduces trajectory functional data and commonly used methods for curve ranking and outlier detection. Section 3 provides the two visualization tools constructed using a new measure of centrality defined especially for trajectory functional data. Section 4 compares the performance of the proposed procedures with several outlier detection methods in a series of simulation studies, and Section 5 presents three data applications of the proposed tools. A conclusion is provided in Section 6.
2 Trajectory Functional Data
Trajectory functional data naturally appear in many situations, such as weather forecasting, ecological studies, and handwriting inputs. They are special forms of multivariate functional data. The main difference is that, instead of visualizing the data along time, the data are mapped in a sub-space by removing the time axis. Figure 1(a) shows classical hurricane trajectory data that record the locations of hurricanes with time. Instead of showing the graph in 3D, we plot the trajectories on a 2D map. We can treat trajectory functional data as a -dimensional stochastic process , where is defined on a compact interval . In the hurricane path example, . Often, the trajectories of a sample share approximately the same starting andor ending points; otherwise, an alignment step should be implemented before analyzing the data.
2.1 Multivariate Curve Ranking
A natural way to rank these trajectory functional data is to use a depth notion for multivariate functional data to make a center-outward ordering for the curves that provides a robust description of the data structure. Here, we consider the following two tools: the simplicial band depth (SBD) (López-Pintado \BOthers., \APACyear2014) and the directional outlyingness (Dai \BBA Genton, \APACyear2019) to perform the ranking. Other possible methods for ranking multivariate functional data include multivariate functional halfspace depth 3 and high-order integrated or infimal depth 22.
2.1.1 Simplicial Band Depth
The simplicial band depth (SBD) (López-Pintado \BOthers., \APACyear2014) is defined as
[TABLE]
where we use a random in defined by . It measures the probability that the random regions in decided by random simplices at time contain .
Because it is usually not likely for a curve to be completely incorporated in a simplex, 19 relaxed the strict containment requirement, and formed a modified simplicial band depth (MSBD) as
[TABLE]
where is the Lebesgue measure on divided by the length of the interval . Obviously, this depth measures the time period during which the trajectory of is incorporated in the simplices determined by .
2.1.2 Directional Outlyingness
Let be a -dimensional function defined on a domain . We define as a depth function for with respect to which denotes the distribution of a random variable, and as the corresponding outlyingness of , with respect to .
In order to capture the shape as well as magnitude outliers, 6 introduced the following definition for directional outlyingness:
[TABLE]
where is the unit vector pointing from the median of to , , and stands for the median of the distribution . 6 suggested to use distance-based depths, e.g., Mahalanobis depth or projection depth 31, to construct the directional outlyingness.
6 defined two indices that measure the outlyingness of functional data: the mean of directional outlyingness (MO) and the variation of directional outlyingness (). In actual situations, we have only a finite set of time points. Therefore, and are the measures used in real applications where .
2.2 Outlier Detection
When the underlying dataset is possibly contaminated, the detection of outliers becomes an important step of exploratory data analysis. For functional data, the existing outlier detection rules consist of three different subtypes: discarding a prefixed proportion of data with respect to the depth values 10, using graphical tools based on the raw curves 17, 27, 30, and approximating the distribution of the depth (or its transformation) values 25, 6. We use two of them that belong to the last two categories, respectively.
2.2.1 Simplicial Band Depth Criteria
The empirical rules of cutoff value are formed by a constant factor times the height of the 50% central region ranked by the depth, where, usually, based on the simulation study conducted by 27, 28. The definition of outliers under MSBD criteria identifies curves that cross the threshold.
2.2.2 Robust Mahalanobis Distance Criteria
Besides setting a cutoff value according to the functional depth distribution, Dai and Genton (\APACyear2019) showed that the distribution of could be asymptotically-approximated by a dimensional Gaussian distribution, if was generated from a -dimensional stationary Gaussian process. They used the robust square Mahalanobis distance:
[TABLE]
where is a group containing points that minimize the determinant of the corresponding covariance matrix. Here, and .
The tail of the following distribution can be approximated by the Fisher -distribution:
[TABLE]
where and are the parameters calculated by an algorithm of 14. Consequently, the outliers are those whose RMD values exceed the 0.993 quantile of . Under the RMD criteria, the VO part contains the variation properties of the curves. However, its importance goes down with the increase in dimension. Overall, the RMD value is a synthesized index for shape and magnitude outliers.
3 Trajectory Functional Data Visualization Tools
3.1 Wiggliness of Directional Outlyingness
Recall that trajectory functional data record the traces of movements from a group of objects, so the most interesting and most common differences between the curves come from the variations of their shapes. Thus, we propose a new tool that specifically detects shape outliers from trajectory functional data, and call it wiggliness of directional outlyingness. Assuming that the outlyingness function is twice differentiable, we first compute the integral of the squared second-order derivative of directional outlyingness, then use its norm, as follows:
[TABLE]
where is a weight function on , and the is a vector of the second-order derivatives of each component of the directional outlyingness function with respect to time. We choose as a constant weight function in this paper. The existence of second-order derivatives can be guaranteed if the trajectories are smooth, and projection depth or Mahalanobis depth are applied to derive the directional outlyingness. When the trajectories are observed with random errors, we may approximate them with smoothing splines.
It is well accepted that the second-order derivative is often used to describe the “wiggliness" of functions. In the smoothing spline model, the sum of the square of the second-order derivative is a classical penalty term for the roughness. From this perspective, WO is good at capturing the wiggliness behavior, and is therefore an effective way to detect shape outliers, especially for the curves with large shape variability but close to the center.
3.2 Properties of WO
We study some properties of WO in the following theorem.
Theorem 1 (Transformation invariance). Let be a functional, having expression , where with and an orthogonal matrix, and is a -dimensional vector, for each . Then,
[TABLE]
We provide the proof of Theorem 1 in the Appendix.
In applications, we usually calculate the WO at a finite set of time points; for example, in , for a finite sample of trajectories. Therefore, we use the following sample version to calculate WO:
[TABLE]
where are approximated by an order-2 difference, at .
Next, we study the distribution of when X is generated from a Gaussian random process, which is the most common case. We assume that is generated from a bivariate stationary Gaussian process with zero mean and a Matérn cross-covariance function 13, 1,
[TABLE]
where denotes the Matérn class of correlation functions 20. We choose , , , , , , , and generate two groups of 5000, 10000 samples with time points.
We calculate the and apply the log transformation. The distribution of can be approximated by a normal distribution, as shown in Figure 2. After normalizing the resulting values, we can approximate the cutoff value by a Gaussian quantile. For example, we can view the -th sample as a potential outlier, if
[TABLE]
where denotes the standard normal cumulative distribution function, denotes the median, and denotes the median absolute deviation. Thus, the cutoff value for outliers can be set by controlling , and we can vary under different situations to visualize the changes of the flagged outliers. A commonly used value for is 0.975. This method focuses mainly on the detection of the outliers and, as shown in Section 4, is not suitable for constructing a ranking of the curves that exhibits a reasonable geometric structure.
3.3 Trajectory Functional Boxplots
We first construct a box-type plot for trajectory functional data, named trajectory functional boxplot, that visualizes different levels of central regions, as well as the outliers. Concretely, the trajectory functional boxplot is constructed through the following procedure:
Detecting outliers using criterion (1) and setting the outliers aside from the dataset;
- 2.
Ranking the remaining data with MSBD to get the center-outward ordering;
- 3.
Plotting the median and bands formed by a specific percentage (e.g., 25%, 50%, 75%) of data with different colors, and then adding the outliers back to the plot.
We provide one example of trajectory functional boxplot in Figure 3. The raw data in Figure 3(a) are generated from Model 2 in the simulation study, where we introduce four shape outliers (the red curves). Figure 3(b) shows the trajectory functional boxplot constructed following the above procedure. The outliers detected by WO with are presented as dashed red curves, the median curve is the solid black curve; the different levels of central regions, derived by MSBD, are in purple (), magenta (), and pink () colors. The combination of WO and MSBD makes the trajectory functional boxplot advantageous for both the construction of central regions and the detection of shape outliers.
3.4 MSBD-WO Plot
Another tool proposed in this paper is the MSBD-WO plot, which is a scatterplot of points , as shown in Figure 3(c). This scatterplot can be used to visualize the distribution of MSBD and WO values for each curve. We expect the most central curve with little shape variability to lie in the bottom-right region of the graph (small WO and large MSBD). The central curves with a large shape variability are mapped to the upper-right region (large WO and large MSBD). The outlying curves with a large shape variability correspond to the upper-left region (large WO and small MSBD), and the outlying curves with a small shape variability correspond to the lower-left region (small WO and small MSBD). Another possibility would be to combine RMD and MSBD. However, MSBD-RMD plots would not be able to distinguish shape outliers and magnitude outliers as well as MSBD-WO plots because a small MSBD would lead to a large RMD.
4 Simulation Studies
To assess the effectiveness of our method for the detection of outliers, we conduct a series of simulation studies. We also compare our method with other outlier detection methods described in Section 3. To investigate the performance of an outlier detector, we use two common measures: , the true positive rate (the number of correctly detected outliers divided by the total number of outlying curves), and , the false positive rate (the number of falsely detected outliers divided by the number of non-outlying curves). We consider the following four models of trajectories with various shapes and types of contamination.
4.1 Simulation Design
Model 1: Shape outliers with small variations
The main body includes 70 lines with different slopes, as follows:
[TABLE]
We add three contaminated outliers, with the first two near the center with larger variations (shape outliers). The third outlier is far from the center, and exhibits the same variations as the first two (outlying for both shape and magnitude):
[TABLE]
An example of trajectories from Model 1 is presented in Figure 4(a).
Model 2: Shape outliers with large variation
We generate a sinusoid function and rotate it through the following rotation matrix:
[TABLE]
We add four outliers with , and rotate them by , where . An example of trajectories from Model 2 is presented in Figure 5(a).
Model 3: Classical closed-shape outliers
We generate a series of circles with increasing radius and noise:
[TABLE]
where for . The noises are generated from a standard normal distribution, and for . The contaminations include one circle with larger noises and three ellipses with the same level of noise as the non-outlying curves. An example of trajectories from Model 3 is presented in Figure 6(a).
Model 4: Special closed-shape outliers
This model has the same main body as Model 3, but is contaminated differently. Specifically, we add rose curves with different leaves as outlying observations. An example of trajectories from Model 4 is presented in Figure 7(a).
We run the simulation with 1000 replications and evaluate the empirical , and their standard deviations. A good performance is usually defined as a high correct detection percentage , and a low false detection percentage . For the simplicial band depth criterion, the constant factor is based on a previous simulation study by 27, 28. We set the cutoff value through and by the algorithm of 14 in the RMD criterion; we choose as the cutoff values for the detection of outliers in the WO criterion.
4.2 Outlier Detection and Visualization
In general, after ranking the data by different criteria, we choose the most central 25% curves, 25%-50% curves, 50%-75% curves as our 25%, 50% and 75% bands, respectively. The outliers under different criteria are defined in Section 3. Figures 4-7 show the plots generated by different criteria. As we can see from Figure 4, MSBD gives a reasonable ranking sequence, from inside to outside. However, the shape outliers in the middle are not easy to detect and they lead to a low . On the other hand, it is less likely to have some falsely detected curves in Model 1. In the RMD case, it does well in discovering all the shape outliers and provide a high . Nevertheless, it shows a higher false detection rate because it considers both the magnitude and shape parts of the abnormality.
It is worth noting that, for RMD, the ranking results for the 50% and 75% bands seem chaotic and irregular, and do not provide a good ranking sequence for constructing a boxplot. For WO, the performance on the detection of shape outliers is excellent, as it shows a high and a low . However, the ranking sequence, in this case, is also a disorder. Therefore, it is inappropriate to construct the body part of the boxplots using WO.
Overall, RMD combines the shape and magnitude behaviors of curves, but MSBD and WO, in this simple case, are more advantageous for ranking sequences and detecting shape outliers, respectively. In Model 1, we demonstrate that our WO criterion has a good performance in detecting shape outliers among the simple straight lines. The patterns in the first five sub-plots of Figure 4 are slightly different because we did not show the 75%-100% band for each detection method; this also applies to Figures 5-7.
Concerning the MSBD-WO plots, the properties of magnitude and shape variability for each curve can be seen on the x-axis and y-axis, respectively. From left to right, the depth value increases with the curves, moving from outside to the center. The black rhombus point has the largest depth value, and therefore stands for the median. From bottom-up, the curves show more and more shape variation and are more likely to be detected as shape outliers.
In Model 2, we find that, under comparatively large variations (the sinusoid curves versus straight lines with variations), the shape outliers detection procedures still perform well for RMD and WO, but that the drawbacks are still that ranking results for the curves do not give a sequence from the center to outside. The 50% and 75% bands reverse their sequence in both the RMD and WO criteria. RMD shows a higher false detection rate (Figure 5 or Table 1), whereas WO shows a fairly good false detection rate. Their medians also seem unreasonable. The advantage for MSBD remains that it provides a reasonable ranking sequence; however, it has a very high false detection rate and many non-outlying curves are detected as outliers. Therefore, combining the advantage of MSBD and WO gives us a good performance in both ranking sequence and shape outlier detections, resulting in the trajectory functional boxplot shown in Figure 5(e). The simulation study gives similar results, and shows the robustness of WO in detecting the shape outliers with higher variability. These open straight curves have many applications in migration paths.
Besides the open curves discussed in Models 1 and 2, we investigate the performance of these methods for closed curves. Closed curves have many real applications in medical diagnosis (e.g., vascular malformation). We test the performance of the outlier detection criterion for closed curves in Models 3 and 4. As we can see from Figures 6, 7 and Table 1, the outlier detection results of WO are still good under these closed curves circumstances; this indicates the robustness of our method for the detection of shape outliers. MSBD also acts well in ranking the functional data, and gives a favorable ranking sequence. Also, it provides a reasonable median curve, compared to WO and RMD, but it shows an unsatisfying classification for the non-outlying curves. RMD’s performance is similar to that for the first two models.
Overall, the WO shows its strength in detecting the shape outliers, whereas MSBD can always give a better ranking sequence. In principle, this phenomenon is understandable because MSBD defines outliers as the curves exceeding a certain threshold distance from the center, but it considers the shape variabilities less. Thus, it is reasonable to combine the strengths of both criteria to build our trajectory functional boxplots.
In Table 1, we report the simulation results based on 1000 replicates comparing three methods in detecting shape outliers. It shows the good performance of WO with a high value and a low value, and with small standard deviations. RMD gives good results for the detection of outliers, but its value is high and with large standard deviations in Models 1 and 2. MSBD, in many cases, does not have good results for and . In practice, users can modify the value to see the changes in the detected outliers.
5 Data Applications
Besides simulation studies, we examine the two visualization tools, the trajectory functional boxplot and the MSBD-WO plot, on three datasets. Our datasets contain open-straight trajectory functional data and mixtures of open and closed trajectories.
5.1 Hurricane Paths
The first dataset consists of hurricane paths. The whole dataset contains 1000 trajectories of longitude and latitude recorded along five common time points in the Caribbean Sea. Because the hurricane path predictions are of interest to many researchers, 4 established an algorithm for generating an ensemble of hurricane paths, based on historical data. 21 constructed a curve boxplot using 50 hurricane tracks simulated with the same algorithm. The raw trajectories are shown in Figure 1(a). It is evident that direct visualization gives more information about the uncertainty of the hurricane path. A hurricane path can be seen as bivariate functional data, for which the explanatory variable is time, and the two response variables are the longitude and latitude.
We apply the two visualization tools to assess the centrality of hurricane paths and set a series of values for ranging between 0.9 to 0.99; a visuanimation 11 of the results is presented in Movie 1. The black curve represents the median ranked by MSBD, which is the rightmost point in the MSBD-WO plot. Purple curves represent the 25% band, magenta curves represent the 50% band, and pink curves represent the 75% band. We can find their ranking sequence in the MSBD-WO plot. The red curves are the outliers based on the WO criteria.
Some of the red curves lying in the 50% central region are detected as outliers due to their shape variability. The curves from different central regions exhibit some differences in length, but the more obvious difference is the width of their spread. The outlying curves behave more wildly than the central ones, which makes these curves become longer. Overall, the above findings are consistent with our conclusions in simulation studies of the Model 1 for open-straight trajectories. Movie 1 shows the change of different shape outlier results with the change of the value from 0.9 to 0.99. The recommended value in the real applications is 0.975, as discussed in the simulation studies, but users have the flexibility to change it.
The trajectory functional boxplot is a valuable tool to visualize hurricane paths, and to give warning to people living nearby. People who live in the 50% central region may experience severe damage due to hurricanes. Therefore, it is sensible to evacuate the population before the landing of the hurricane. People also receive more information about possible outlying paths. Those who live in Texas may experience the effects of dangerous hurricanes, even if they are not covered by the 50% and 75% central regions. The central outlying trajectories show that there is a significant probability that a hurricane may turn Westward, even if it has already landed in Alabama.
5.2 Migration Patterns
We consider ecological applications to two datasets of migration patterns: the Tsinghua waterfowl data and the petrel distribution data.
The Tsinghua waterfowl dataset is from Movebank (Si \BOthers., \APACyear2018). It contains Spring migration patterns, habitat use and stop-over site protection status for two declining waterfowl species wintering in China, as revealed by satellite tracking. It has GPS information about the routes of the waterfowl. In this case, the paths are complex. Some waterfowls may stay at someplace for a few days (which means their paths have different lengths). Some have a round-trip (viewed as closed curves), and some have straight trajectories (viewed as open curves).
In this study, we view all the routes as bivariate functional data along time. After some necessary cleaning of the data, we choose 24 bird migration trajectories as our raw data. Because the recording frequencies are different, some of them have 1000 time-point records for longitude and latitude, whereas others only have 200 points. Therefore, we use a cubic smoothing spline to fit different trajectories, and choose 200 common time points for all 24 birds. The raw trajectories are shown in Figure 1(b). Further, we align these curves so that they start from the same spot. With a cutoff value , we obtain the trajectory functional boxplots and the corresponding MSBD-WO plot shown in Figure 8.
The trajectory functional boxplot gives us a meaningful representation of routes of waterfowl migration that provides more information to study and observe their behavior from an ecology perspective. Specifically, we can build more stations in the region covered by the 50% band to record the migration pattern for the birds. The weird outlier migration paths might occur due to bad weather or a natural disaster. Based on these results, the biologists may take a further step to investigate their behaviors, according to the different categories.
The second dataset comes from 8, who studied the impact of an extensive fishery for Antarctic krill Euphausia Superba on marine ecosystems, more specifically, the influence of fishing on petrel, which is a predator of Antarctic krill. The study involved recording not only the positions where predators are breeding near the fishing grounds but also those where they are breeding far away. Positions of the birds during the non-breeding season are also included. This dataset involves complex and irregular trajectories depicted in Figure 1(c). Figure 9 shows the trajectory functional boxplot and the MSBD-WO plot constructed from these data.
In the data preprocessing part, we apply the same data-cleaning and smoothing procedures as above and choose 124 paths as our processed data. However, in this case, the trajectories are more irregular, some are twisted curves, and some are closed curves, which poses significant challenges to our method. We also set in this case.
Similarly to the simulation study of Models 3 and 4, we find that our trajectory functional boxplot detects the shape outliers well; they reveal large variations but located within the central regions, as shown by the red curves in Figure 9. The magenta 50% band contains the routes where petrels fly not far away from the continent and the pink 75% band includes the routes where petrels fly either very far away or close to the origin. Outliers are straightforward to view in our trajectory functional boxplot. We need to pay more attention to those central outliers because their routes seem quite irregular and twisted. It appears that the fishery industry has a more significant influence on these petrels. Overall, our method serves as a way to separate different flying patterns. The trajectory functional boxplot is helpful for studying the behavior patterns of the petrels according to their assigned categories in the plot.
6 Conclusion
We introduced two novel exploratory tools, the trajectory functional boxplot and the MSBD-WO plot, for visualizing the centrality and detecting outliers of trajectory functional data. To detect abnormal observations, we proposed a criterion focusing on shape outliers; the MSBD provides a ranking revealing a nested structure that provides a more informative and robust description for the bulk of the data. The practical performance of the tools was assessed using hurricane paths, waterfowl migrations, and petrel distributions datasets.
Trajectory functional data may have covariates as well, for example, the wind speed of the hurricane. These covariates can be included in the ranking based on directional outlyingness for multivariate functional data. Moreover, various data transformations can be considered to improve the rankings further, as investigated by 7.
Acknowledgements
We thank Dr. Donald H. House and his group at Clemson University for sharing the ensemble hurricane generator code. The research reported in this paper was supported by King Abdullah University of Science and Technology (KAUST).
Appendix
Proof of Theorem 1:
According to Theorem 1 of Dai \BBA Genton 6, we have the following result for at a fixed time point:
[TABLE]
If the function has a second order derivative then its smoothness is retained through a rotation by the orthogonal matrix . Denoting , it is then obvious that
[TABLE]
since is orthogonal. For a constant weight function , we conclude that
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Apanasovich \B Others . \APA Cyear 2012 \APA Cinsertmetastar apanasovich 2012 valid {APA Crefauthors} Apanasovich, T \BPBI V., Genton, M \BPBI G. \BCBL \BBA Sun, Y. \APA Cref Year Month Day 2012. \BBOQ \APA Crefatitle A valid Matérn class of cross-covariance functions for multivariate random fields with any number of components A valid Matérn class of cross-covariance functions for multivariate random fields with any number of components. \BBCQ \APA Cjournal Vol Num Pages Journal of the A
- 2Arribas-Gil \BBA Romo \APA Cyear 2014 \APA Cinsertmetastar arribas 2014 shape {APA Crefauthors} Arribas-Gil, A. \BCBT \BBA Romo, J. \APA Cref Year Month Day 2014. \BBOQ \APA Crefatitle Shape outlier detection and visualization for functional data: the outliergram Shape outlier detection and visualization for functional data: the outliergram. \BBCQ \APA Cjournal Vol Num Pages Biostatistics 154603–619. \Print Back Refs \Current Bib
- 3Claeskens \B Others . \APA Cyear 2014 \APA Cinsertmetastar claeskens 2014 multivariate {APA Crefauthors} Claeskens, G., Hubert, M., Slaets, L. \BCBL \BBA Vakili, K. \APA Cref Year Month Day 2014. \BBOQ \APA Crefatitle Multivariate functional halfspace depth Multivariate functional halfspace depth. \BBCQ \APA Cjournal Vol Num Pages Journal of the American Statistical Association 109411–423. \Print Back Refs \Current Bib
- 4Cox \BBA Lindell \APA Cyear 2013 \APA Cinsertmetastar cox 2013 visualizing {APA Crefauthors} Cox, J. \BCBT \BBA Lindell, M. \APA Cref Year Month Day 2013. \BBOQ \APA Crefatitle Visualizing uncertainty in predicted hurricane tracks Visualizing uncertainty in predicted hurricane tracks. \BBCQ \APA Cjournal Vol Num Pages International Journal for Uncertainty Quantification 32143–156. \Print Back Refs \Current Bib
- 5Dai \BBA Genton \APA Cyear 2018 \APA Cinsertmetastar dai 2018 multivariate {APA Crefauthors} Dai, W. \BCBT \BBA Genton, M \BPBI G. \APA Cref Year Month Day 2018. \BBOQ \APA Crefatitle Multivariate Functional Data Visualization and Outlier Detection Multivariate functional data visualization and outlier detection. \BBCQ \APA Cjournal Vol Num Pages Journal of Computational and Graphical Statistics 27923-934. \Print Back Refs \Current Bib
- 6Dai \BBA Genton \APA Cyear 2019 \APA Cinsertmetastar dai 2018 directional {APA Crefauthors} Dai, W. \BCBT \BBA Genton, M \BPBI G. \APA Cref Year Month Day 2019. \BBOQ \APA Crefatitle Directional outlyingness for multivariate functional data Directional outlyingness for multivariate functional data. \BBCQ \APA Cjournal Vol Num Pages Computational Statistics & Data Analysis 13150–65. \Print Back Refs \Current Bib
- 7Dai \B Others . \APA Cyear 2018 \APA Cinsertmetastar 2018 ar Xiv 180805414 D {APA Crefauthors} Dai, W., Mrkvička, T., Sun, Y. \BCBL \BBA Genton, M \BPBI G. \APA Cref Year Month Day 2018. \BBOQ \APA Crefatitle Functional outlier detection and taxonomy by sequential transformations Functional outlier detection and taxonomy by sequential transformations. \BBCQ \APA Cjournal Vol Num Pages ar Xiv e-printsar Xiv:1808.05414. \Print Back Refs \Current Bib
- 8Descamps \B Others . \APA Cyear 2016 \APA Cinsertmetastar descamps 2016 sea {APA Crefauthors} Descamps, S., Tarroux, A., Cherel, Y., Delord, K., Godø, O \BPBI R., Kato, A. \BDBL others \APA Cref Year Month Day 2016. \BBOQ \APA Crefatitle At-sea distribution and prey selection of Antarctic petrels and commercial krill fisheries At-sea distribution and prey selection of antarctic petrels and commercial krill fisheries. \BBCQ \APA Cjournal Vol Num Pages Plo S One 118e 0156968. \Print Back Refs \Cur
