The conditionally autoregressive hidden Markov model (CarHMM): Inferring behavioural states from animal tracking data exhibiting conditional autocorrelation
Ethan Lawler, Kim Whoriskey, William H. Aeberhard, Chris Field, Joanna, Mills Flemming

TL;DR
This paper introduces the CarHMM, a novel extension of the hidden Markov model designed to better analyze animal movement data exhibiting conditional autocorrelation, with practical guidelines and biological interpretations.
Contribution
The paper develops the CarHMM, addressing limitations of traditional HMMs by modeling conditional autocorrelation in animal movement data, and provides comprehensive analysis guidelines.
Findings
CarHMM effectively models conditional autocorrelation in movement data.
Guidelines for data preprocessing, model selection, and checking are provided.
Interpretation of behavioral states in biological terms is enhanced.
Abstract
One of the central interests of animal movement ecology is relating movement characteristics to behavioural characteristics. The traditional discrete-time statistical tool for inferring unobserved behaviours from movement data is the hidden Markov model (HMM). While the HMM is an important and powerful tool, sometimes it is not flexible enough to appropriately fit the data. Data for marine animals often exhibit conditional autocorrelation, self-dependence of the step length process which cannot be explained solely by the behavioural state, which violates one of the main assumptions of the HMM. Using a grey seal track as an example, along with multiple simulation scenarios, we motivate and develop the conditionally autoregressive hidden Markov model (CarHMM), which is a generalization of the HMM designed specifically to handle conditional autocorrelation. In addition to introducing and…
| State 2 | ||||
| Low | Med | High | ||
| Low | (0.131, 0.148) [4] | (0.111, 0.128) [2] | (0.074, 0.087) [0] | |
| State 1 | Med | (0.139, 0.161) [5] | (0.082, 0.098) [0] | |
| High | (0.209, 0.240) [0] | |||
| State 2 () | |||||
|---|---|---|---|---|---|
| 0.5 | 0.6 | 0.7 | 0.9 | ||
| 0.5 | (0.214, 0.234) [1] | (0.204, 0.227) [0] | (0.188, 0.207) [1] | (0.125, 0.154) [2] | |
| State 1 | 0.6 | (0.226, 0.240) [3] | (0.204, 0.223) [2] | (0.194, 0.210) [0] | (0.127, 0.147) [2] |
| () | 0.7 | (0.224, 0.245) [2] | (0.207, 0.223) [2] | (0.196, 0.218) [2] | (0.115, 0.134) [0] |
| 0.9 | (0.213, 0.242) [6] | (0.179, 0.226) [6] | (0.154, 0.180) [1] | (0.082, 0.104) [2] | |
| Two State Model | |||
|---|---|---|---|
| Simulated Model | HMM | CarHMM | |
| Fitted | HMM | (0.120, 0.138) [8] | (0.434, 0.474) [0] |
| Model | CarHMM | (0.125, 0.145) [7] | (0.072, 0.083) [0] |
| Three State CarHMM Parameter Estimates | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| State 1 | State 2 | State 3 | State 1 | State 2 | State 3 | ||||||
| 0.398 | 1.291 | 2.074 | -0.129 | -0.050 | 0.002 | 0.713 | 0.287 | 0.000 | |||
| 0.277 | 0.781 | 0.961 | 0.402 | 0.780 | 0.906 | 0.149 | 0.797 | 0.054 | |||
| 0.279 | 0.318 | 0.164 | 0.000 | 0.120 | 0.880 | ||||||
| 0.264 | 0.508 | 0.228 | |||||||||
| Two State Elk Parameters | ||||||||
| State 1 | State 2 | State 1 | State 2 | |||||
| 3.364 | 0.355 | 0 | 0 | 0.75 | 0.25 | |||
| 0 | 0 | 0.228 | 0.6 | 0.15 | 0.85 | |||
| 4.329 | 0.378 | |||||||
| Three State Seal Parameters | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| State 1 | State 2 | State 3 | State 1 | State 2 | State 3 | ||||||
| 0.202 | 0.998 | 2.091 | 0 | 0 | 0 | 0.848 | 0.142 | 0.005 | |||
| 0.04 | 0.429 | 0.945 | 0.209 | 0.681 | 0.867 | 0.065 | 0.754 | 0.164 | |||
| 0.157 | 0.529 | 0.235 | 0.005 | 0.164 | 0.831 | ||||||
| Two State “Best Practice” Seal Parameters | ||||||||
|---|---|---|---|---|---|---|---|---|
| State 1 | State 2 | State 1 | State 2 | |||||
| 0.552 | 1.731 | 0 | 0 | 0.826 | 0.174 | |||
| 0.407 | 0.892 | 0.508 | 0.858 | 0.12 | 0.88 | |||
| 0.351 | 0.244 | |||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Contents
- 1 Introduction
- 2 CarHMM formulation
- 3 Interpretation of CarHMM parameters
- 4 Data inspection and model checking
- 5 Simulation Study
- 6 Best practice analysis of a male grey seal track
- 7 Discussion
- A Extra Supplementary Material
Abstract
One of the central interests of animal movement ecology is relating movement characteristics to behavioural characteristics. The traditional discrete-time statistical tool for inferring unobserved behaviours from movement data is the hidden Markov model (HMM). While the HMM is an important and powerful tool, sometimes it is not flexible enough to appropriately fit the data. Data for marine animals often exhibit conditional autocorrelation, self-dependence of the step length process which cannot be explained solely by the behavioural state, which violates one of the main assumptions of the HMM. Using a grey seal track as an example, along with multiple simulation scenarios, we motivate and develop the conditionally autoregressive hidden Markov model (CarHMM), which is a generalization of the HMM designed specifically to handle conditional autocorrelation.
In addition to introducing and examining the new CarHMM, we provide guidelines for all stages of an analysis using either an HMM or CarHMM. These include guidelines for pre-processing location data to obtain deflection angles and step lengths, model selection, and model checking. In addition to these practical guidelines, we link estimated model parameters to biologically meaningful quantities such as activity budget and residency time. We also provide interpretations of traditional “foraging” and “transiting” behaviours in the context of the new CarHMM parameters.
Keywords: hidden Markov model, movement ecology, discrete time, marine animal movement, autoregressive process, model checking
1 Introduction
The study of animal movement is fundamental to ecology because it is inherently linked to critical processes that scale from individuals to populations and communities to ecosystems (Hooten et al., 2017a, ). Rapid technological advancements over the past several decades have given rise to a variety of electronic tracking devices that can remotely monitor animals in challenging environments (Hussey et al.,, 2015) as well as an assortment of statistical methods for analyzing the resulting (big) movement data.
Statistical models for animal movement data are most commonly formulated in discrete time (Hooten et al., 2017b, ), and are increasingly aimed at inferring behavioural “states” from observed tracks. In this context, the data (called tracks, or location data) generally consist of a regularly observed time series of locations of an animal. Inferring behavioural states from location data was initially made possible by a proposal in Morales et al., (2004) to transform the data into a bivariate series of step lengths and deflection angles. In their example, they use characteristics of the step length and deflection angle series to determine when an elk is in an “encamped” state and when it is in an “exploratory” state.
There are many different ways to estimate behavioural states from this type of tracking data. While traditionally achieved using likelihood methods (frequentist or Bayesian), any unsupervised classification method can be used. Some examples are mixture models (Morales et al.,, 2004), clustering models, and -means clustering (Curry,, 2014). If researchers have actually observed an animal’s behaviour at some points in time (for example through recorded video), then any supervised classification method could also be used.
The work of Morales et al., (2004) along with that of Jonsen et al., (2005), popularized the state-switching model framework into the de facto way of analysing animal movement data in discrete time (McClintock et al.,, 2012; Whoriskey et al.,, 2017; Patterson et al.,, 2017). While not all animal movement models which incorporate state-switching into the movement process have distinct behavioural states, the ones that do generally fall under the hidden Markov model (HMM) framework. These models assume that there are underlying behaviours driving the animal movement process (Michelot et al.,, 2016).
Hidden Markov models for animal movement have a number of desirable properties: they have an easily computable likelihood which is typically fast to optimize, the model parameters have clear interpretations, and they can fairly easily handle different types of data (including missing data) in the same model (Zucchini et al.,, 2016). The baseline formulation of the HMM has a few key assumptions: the underlying state process is assumed to form a Markov chain, and the observed step lengths and deflection angles are conditionally independent given the behavioural state.
The effect of various violations of these assumptions are discussed in Pohle et al., (2017). They found that neglecting a semi-Markov state process (which directly models state residency time), a higher order Markov chain for the behavioural process, or violations of conditional independence can introduce bias to parameter estimates and favor models which have more behavioural states than actually exist. Semi-Markov state processes are also considered in Langrock et al., (2012) while higher order state processes are presented in Zucchini et al., (2016).
The papers in the recent Journal of Agricultural, Biological, and Environmental Statistics special edition on animal movement also started to address other problems associated with the discrete-time model framework in general, such as telemetry error, irregularly spaced data, and occasional missing data (McClintock,, 2017), the temporal scale and resolution of the behaviours involved in the data (Leos-Barajas et al.,, 2017), and choosing the number of behavioural states to use (Pohle et al.,, 2017). Methods for assessing goodness of fit for animal movement models were discussed in Potts et al., (2014), wherein they found that none of the 20 highest cited papers at the time tested goodness of fit to the data. Since then, the moveHMM (Michelot et al.,, 2016) and momentuHMM (McClintock and Michelot,, 2018) R packages have implemented easy to use residuals, although the use of residuals in the literature is still uncommon, or at least under-reported.
The current paper introduces a conditionally autoregressive hidden Markov model (CarHMM) that does not require the assumption of conditional independence of the movement process given the behavioural process. We do this by introducing an autocorrelation parameter in the step length distribution of the traditional HMM (such as that implemented in Michelot et al., (2016)). The use of an autocorrelated step length process was also present in a continuous-state model for estimating the effect of environmental covariates on behavioural memory in Forester et al., (2007).
Throughout, we provide general practice guidelines wherever possible. Since analyses of animal movement typically use offline data , we propose standardizing the observed step lengths by dividing by the mean observed step length. This allows comparison of models across data sources, animals, species, etc. We use a lag-plot of step length for determining if the conditional autocorrelation is necessary, and note its possible use to help in choosing the number of behavioural states to use. In the case of irregularly observed tracks, we also discuss how to choose an appropriate interpolation time step for the model, as well as how to deal with extensive missing data by grouping observations.
In Section 2, we present the formulation of the model including computation of the likelihood and give references for the theoretical properties. Section 3 discusses the biology associated with specific transformations of the model parameters. Section 4 deals with pre-processing the locations by choosing a time step and dealing with missing data, as well as model selection and validation. Both Section 3 and Section 4 are useful for most types of discrete-time models, including the HMM and the CarHMM. Section 5 presents four short simulation studies. Section 6 demonstrates best practice for using the CarHMM through the analysis of a male grey seal.
2 CarHMM formulation
We assume that the data consist of a set of step lengths between locations at time to and deflection angles between locations at times , , and . Locations are assumed to be observed on a discrete and evenly spaced time grid. Here, step lengths measure the distance between consecutive locations, and deflection angles measure the angular change in direction between three locations. We discuss observations which are irregularly spaced in Section 4.1.
We introduce a behavioural state process which is a Markov chain on a finite set of states . Thus the distribution for is completely determined by the value of and the transition probability matrix . The entry of gives the probability of transitioning from state to state . We assume (other choices are possible) that the initial distribution of the behavioural state Markov chain is given by the stationary distribution , which is the vector such that and .
Given the behavioural state at time , the step length at time and deflection angle at time are assumed to be conditionally independent of all other observations and behavioural states, with the key exception that step length at time is allowed to be dependent on step length at time . A first order autoregressive process is assumed for step lengths . While any valid distributions can be used, in this presentation we assume a gamma () distribution for step lengths and a wrapped Cauchy (WC) distribution for deflection angles . A distribution has mean (reversion level , autocorrelation ) and standard deviation , with the more traditional shape and scale parameters being and , respectively. The distribution has center and concentration with density function
[TABLE]
In cases where we have all of the data before analysis (i.e. we are not streaming the data), we standardize all step lengths by dividing by the observed mean step length. This removes units (for example, kilometers) and standardizes parameter interpretation across data sources, animals, species, etc. Comparison of standardized parameter estimates across data sources is dependent on many factors, including the temporal resolution of each data source, choices made during the model procedure, and the biology/ecology of the animals being compared. Conditional on the mean observed step length, dividing by the mean does not alter parameter inference since dividing a gamma distribution by a (non-zero) constant results in another gamma distribution. In practice, we store the observed mean step length so we can un-standardize later if desired. For the rest of the paper, we will assume that the symbol stands for standardized step length.
With all of the terms now defined, the CarHMM is formulated as
[TABLE]
Although the locations themselves do not enter the likelihood for the model directly, we include the “Movement” equation to show the connection between the locations and the step lengths and deflection angles. In this equation, represents the change in direction at time . If, as we strongly recommend, the coordinates are latitude-longitude pairs, then the equation as given is more of a symbolic representation. The fully written equation is based on spherical geometry. If the coordinates are projected, then can be written as a standard rotation matrix.
The likelihood is computed as the matrix product
[TABLE]
where is the diagonal matrix
[TABLE]
and is a vector of ones. Since and are considered conditionally independent given the behavioural state, their joint probability density function is the product of the individual densities. In practice, we compute the log-likelihood using forward recursion with scaling as presented in Section 3.2 of Zucchini et al., (2016).
When the autocorrelation is fixed at 0 for all , the CarHMM reduces to a standard HMM. In addition, when is fixed at 1 for all , it is possible to show that the CarHMM reduces to a component-wise relative of the hidden Markov movement model of Whoriskey et al., (2017) though the details will not be shown here. Further, other generalizations to the standard movement HMM such as adding a semi-Markov state process could be applied to the CarHMM.
When using the distribution must be non-negative and must be within the unit interval. Another useful choice of distribution for step length is the log-normal distribution, where the log of the step length has mean . In this case the parameters are unrestricted. However to ensure the step length process is stable, it is sufficient (but not necessary) that the estimates satisfy for all . If this is not the case it may be a sign of numerical instability in the optimizer.
We use the maximum likelihood framework; the parameters to be estimated are , , , , for giving parameters for the “Action” distribution, and the off-diagonal transition probabilities for giving parameters for the “Behaviour” distribution, for a total of parameters. The remaining transition probabilities are not free parameters since the row sums of must equal 1. The unobserved behavioural states are predicted using the well known Viterbi algorithm, see e.g. Zucchini et al., (2016).
Identifiability of models in the Markov-switching autoregressive class, which includes the CarHMM, is proven in Douc et al., (2004), with consistency and asymptotic normality of the ML estimates when using the log-normal distribution resulting from the same paper. Consistency and asymptotic normality when using the distribution is studied in Ailliot, (2006). One notable consistency condition is that the entries of must be strictly positive for parameter estimation to be consistent. If any estimated value is close to zero, this could be a sign of having too many states in the model, unless there is a biologically meaningful reason for including the extra state.
3 Interpretation of CarHMM parameters
Here we discuss the concepts of activity budget, behavioural residency time, and mean reversion level. These are all obtained as transformations of the model parameters and are related to the biology of the animal.
Both activity budget and behavioural residency time can be obtained from the transition probability matrix . The stationary distribution itself can be interpreted as an activity budget, where the entry of gives the expected proportion of time that the animal spends in the behavioural state. For example, the activity budget can give estimates of the proportion of time spent transiting as compared to foraging.
Behavioural residency time is the amount of time that an animal will remain in a given behavioural state before switching to a different state. These can be modelled explicitly using semi-Markov state processes, though in our case they follow a geometric distribution (Langrock et al.,, 2012). For a geometric distribution, the expected number of time steps spent in state is given by . When converted to real time units (hours, minutes, etc.) by multiplying by the chosen time step, this value gives an estimate of the time scale of the behaviour being modelled and is important in giving biologically meaningful interpretations to the behavioural states.
Since the step length process within a given behavioural state follows a first-order autoregressive process, the parameter gives the reversion level of the process for a given state. The mean reversion level is defined to be the expected value of step length as the time spent within the behavioural state approaches infinity.
[TABLE]
Within each behavioural state this value acts as an attractor in the step length distribution. Thus consecutive step lengths in the same behavioural state will tend to converge on this value, with the strength of attraction being inversely proportional to and proportional to the distance of the previous step length to .
One of the attractive features of hidden Markov models for movement data is the connection between the underlying behavioural state of the Markov chain with behaviours exhibited by the animal. While the connection between the behavioural state in the model and the behaviours exhibited by the animal is sometimes tenuous, it can be useful to label the behavioural states of the model. Two common “behaviours” used in the HMM context are “foraging” and “transiting” (quotations are used to emphasise that these labels may not reflect actual behaviour).
In the standard HMM, foraging is typically characterized by short step lengths and diffuse deflection angles. Transiting is typically characterized by long step lengths and deflection angles concentrated at zero degrees. We can update these behaviours in the CarHMM framework. Here foraging is characterized by short step lengths with little autocorrelation along with diffuse deflection angles. Transiting is characterized by longer and highly autocorrelated step lengths along with deflection angles concentrated at zero degrees.
4 Data inspection and model checking
4.1 Pre-processing locations
When dealing with marine animal tagging data the observed locations are typically not on a regular time grid. In order to use the CarHMM, we linearly interpolate the observed locations to a regular time grid. An alternative is to use the multiple imputation approach proposed in McClintock, (2017). However even when using this multiple imputation approach one must choose a sensible time grid.
Interpolation requires making a few decisions such as: how are observations which are very far apart in time (long stretches of missing data) dealt with? and, what is the best time step (the time between points on the temporal grid) to use for the interpolation? We deal with the first by splitting the track into separate groups whenever the time between consecutive observations is greater than some cutoff level, which we call the group cutoff level. After defining the time step, which we require to be the same for all groups, and the group cutoff level the observed locations are interpolated within their separate groups on to the regular time grid. The interpolated locations are then processed to obtain the deflection angles and step lengths which enter the likelihood.
There are two metrics we propose to help in choosing a time step and group cutoff level: the proportional sample size
[TABLE]
and the adjusted proportional sample size
[TABLE]
The first is designed to preserve the number of locations in the track, while the second is designed to preserve the number of data points which enter the likelihood.
To choose a (heuristically) best time step and group cutoff level, we recommend:
restrict the time step to be somewhere between the median and 3rd quartile of the observed time differences in the original data; 2. 2.
for whichever time step is chosen, restrict the group cutoff level to be no more than twice the time step (and no less than the time step itself); 3. 3.
with the above restrictions, set up a grid of time steps and group cutoff levels and compute and for each point in the grid. Choose whichever makes both and as close to 1 as possible.
We follow these guidelines in choosing the time step for the best practice analysis of Section 6. These guidelines are meant to avoid both over-smoothing the data (and therefore losing information) by choosing too large a time step, or inadvertently replicating the data by choosing too small a time step. Values of or less than one are indicators of over-smoothing while values greater than one are indicators of data replication.
A brief experiment suggested that interpolation (either linear interpolation or with a variety of different splines) may introduce a significant amount of autocorrelation in the step lengths. Further investigation is needed to determine the exact effects of interpolation. Whether most of the autocorrelation is inherent to the track or is introduced by interpolating the locations, accounting for it in the model (such as the CarHMM does) is necessary.
Once the locations are interpolated and grouped with a regular time step, they are processed to obtain deflection angles and step lengths. These should be obtained from unprojected coordinates so that both the deflection angles and step lengths are accurate throughout the spatial extent of the data. The model is fitted to all of the grouped data in a single likelihood, assuming that groups are independent of each other and that the groups share the same true parameters. Within a group, the first step length of that group is taken to be the initial condition for the step length autoregressive process, and the initial distribution of the underlying state is always taken to be the stationary distribution.
4.2 Model selection
We consider two components to model selection for the CarHMM: deciding whether to use the HMM or the CarHMM (i.e., whether to fix or not), and choosing the number of behavioural states.
First we introduce the lag-plot, an exploratory graphic useful for understanding the nature of any autocorrelation present in the step length process. The lag-plot at lag is a kernel-density plot of against . Examples of these plots are given in Figure 2 of Section 5. These lag-plots give a more detailed description of the autocorrelation than the simple autocorrelation function, and have a couple of immediately helpful uses.
Most importanly in the current case, by examining a lag-plot at lag 1, it can be possible to determine which of the HMM and CarHMM is more appropriate for the data. These plots show the different types of autocorrelation present within the HMM and CarHMM, compared with a real dataset. The HMM plot will have a pattern of distinct circular droplets along the line , a result of the autocorrelation in the behavioural states, while the CarHMM plot will have an elongated smear along the line , due to the within-state autocorrelation in step lengths.
In ideal cases, the lag-plot at lag 1 may also suggest the number of states exhibited in the data. Particularly for data with HMM-like autocorrelation, the number of distinct droplets corresponds to the number of distinct states. This becomes more complicated with more latent states and with more CarHMM-like autocorrelation. Choosing the number of states to use is a notoriously difficult problem, with traditional metrics such as AIC and BIC generally selecting too many states to be biologically meaningful. For in-depth discussion, we refer the reader to Pohle et al., (2017). A recommended starting point is to use as few states as necessary to achieve an adequate fit.
Other uses of the lag-plot include comparing the autocorrelation characteristics of different time step choices. For example, we could test the intuitively attractive idea that a short time step results in step lengths with high within-state autocorrelation, while a long time step results in step lengths with low within-state autocorrelation. Further, with multi-state models the traditional autocorrelation function can do a poor job of quantifying the autocorrelation. The lag-plot includes more detail such that the characteristics of each state can be discerned.
4.3 Residuals
For model checking we follow the probability scale residual framework of Shepherd et al., (2016), and in particular use one-step-ahead forecast residuals. Here, the forecast distribution for step length would be a mixture of gamma distributions with means , standard deviations , and mixture rates given by the row of .
If we have specified the model structure correctly then these residuals should be uniformly distributed on (-1,1) and exhibit no autocorrelation. Further, they should have this property within each behavioural state, though small sample size can be a problem here. Departures from a uniform distribution can be detected by looking at a quantile-quantile (Q-Q) plot of the residuals. Residual autocorrelation can be identified in plots of the autocorrelation function of these residuals, or in lagplots such as those proposed for model selection.
5 Simulation Study
To address the performance and properties of the new CarHMM model, we present brief vignettes of four simulation studies. First, we investigate the effect of track length. Second, we look at the effect of within-state autocorrelation. The third study looks at the effect of the transition probabilities. The fourth and final study compares the regular HMM and the CarHMM. When simulating data we simulate both the underlying behavioural state and observed data from scratch. We do not reuse the behavioural states estimated from original data.
The main metric we use to assess these simulations is the interquartile range (first and third quartiles) of the state estimate error. For a particular track, the state estimate error is the percentage of state estimates which do not agree with the true simulated state. The quartiles are then computed over e.g. 50 simulations under the same model.
To account for numerical instability in the maximization of the likelihood, our fitting procedure for these studies attempts to fit the model to each simulated track at most 10 times, with different random starting values each time, until the model converges. If the model does not converge successfully on any of those 10 attempts, we remove that particular track from consideration. We also remove tracks which give unreasonable parameter estimates (in particular, any model fit which gives a stationary distribution with an entry less than 0.01), or estimates which give clear signs of numeric instability (deflection angle concentration less than , or a transition probability matrix with any row having all equal entries). When using real data we can tweak exactly how we optimize the likelihood (change various control parameters, pick starting values, etc.) so this practice is not an inherent shortcoming of the model or fitting method.
The Viterbi algorithm is currently the most common way to estimate behavioural states in HMM-like models. Briefly, the algorithm takes as input parameter point estimates and the observed data, and outputs the most likely sequence of behavioural states as a point estimate. With the Viterbi algorithm, the accuracy of the state estimates is dependent on the amount of overlap of the state-dependent distribution. If two states have significant overlap, the Viterbi algorithm will perform much worse than if the two states were distinct.
At no point are standard errors of parameter estimates or uncertainty statements about the behavioural states considered. Because of this, any error in the parameter estimates directly translates to a source of error in the state estimates, with no hope of correcting for the uncertainty in the parameter estimates. Because the Viterbi algorithm only gives the most likely sequence of behavioural states with no uncertainty estimates, the estimated behavioural states must be interpreted with care. In addition to the tenuous connection between the behavioural state labels and the biology, there is the additional problem that we do not know how likely the most likely behavioural state path is. Simulations, such as the ones below, can help determine how much error to expect. However, since the actual states are unobserved, it is not possible to know the actual error rate.
5.1 Effect of track length
In practice, one would hope that collecting more data (i.e. longer animal tracks) would decrease the amount of error in both parameter estimates and state estimates. Simulations suggest that, while error in parameter estimates will decrease with longer track lengths, there is an inherent amount of error to be expected in behavioural state estimates that cannot be overcome with increased track length.
Figure 1 shows the bias and state estimate error for two different models. Here we compute bias as the median difference, across simulations, of parameter estimates from the true parameter. The two state model is a HMM which takes (slightly modified) parameters estimated from an elk dataset analyzed in the vignette for the R package moveHMM (Michelot et al.,, 2016). The three state model is a CarHMM which takes parameters estimated from a grey seal dataset (different from the one presented in Section 6). The parameters for both models can be found in Table 5 and Table 6, respectively, of ESM A.1
Figure 1 shows that collecting more and more data for a single track is not effective past a certain point. For the remaining simulation studies we use track lengths of 1,000. We report the first and third quartiles of the state estimate error, which Figure 1 suggests will have stabilized at this track length.
5.2 Effect of autocorrelation
The accuracy of the Viterbi algorithm depends heavily on the amount of overlap of the state-dependent distributions. Recall that the mean of the step length distribution is given by Consider the autocorrelation as a weight between , which will depend on the state at time , and , which does not. If is close to one in two states with drastically different , then the two states will overlap since is essentially irrelevant in both states.
Table 1 shows the state error rate for a two state model with different amounts of autocorrelation (each state taking either a low, medium, or high amount). The parameters are modified from the same elk example used in Section 5.1. Overall we see that increasing the autocorrelation of both states leads to an increase in the amount of state estimate error, while differentiating the amount of autocorrelation between the two states leads to a decrease in the amount of error.
5.3 Effect of transition probabilities
Unlike in the standard HMM, the observed state-dependent distributions for the CarHMM are indirectly affected by the transition probabilities of the underlying behavioural states. States with low autocorrelation act as anchors in the step length series, while states with high autocorrelation tend to wander. The longer that an animal is in a state with high autocorrelation (by having a high probability of remaining in the same state), the more we expect that step length series to wander.
Figure 4 in ESM A.1 gives observed step length distributions for a variety of different transition probabilities. Table 2 gives state estimate error under the same variety of transition probabilities. We see that the probability of remaining in state 2 (with high autocorrelation) affects the state estimate error, as this probability is what determines how free the second state is to wander. The more that the high autocorrelation state drifts away from the mean of the low autocorrelation state (from left to right in the table), the less overlap there is in their distribution, which increases the accuracy of the Viterbi algorithm. The parameters can be found in Table 7 of ESM A.1.
5.4 Comparison of HMM and CarHMM
To show the importance of accounting for conditional autocorrelation in the data, we simulate data under both the HMM and the CarHMM and fit both the HMM and CarHMM to each simulation. We use parameters from two different datasets: the “Low-High” two state parameters from the elk track considered in subsection 5.2, and three state parameters estimated from the grey seal track analyzed in Section 6. The parameters for the models can be found in Table 5 of ESM A.1, and Table 4 of Section 6, respectively. Figure 2 shows example lag-plots under the HMM and the CarHMM for the three state model. As mentioned earlier, these plots can help in model selection.
Table 3 shows the state estimation error rate for the four different scenarios. The CarHMM is just as effective as the HMM when fitted to HMM data with no conditional autocorrelation. However, the two-state HMM ( error) performs only slightly better than random guessing ( error) when fitted to CarHMM data with conditional autocorrelation. We expect this amount of error to persist across models that have at least one state with significant autocorrelation. The three-state HMM has the same problem, although performs much better than the 66% error expected from random guessing.
These simulations raise interesting questions about the validity of previous research utilizing hidden Markov models with irregularly timed data, especially since we suspect a non-trivial amount of autocorrelation is introduced through interpolating the locations to a regular grid. However, we only mention this point and leave the discussion for another time.
Computation Time and Implementation
All simulations were computed on a laptop running Linux with a quadcore Intel Core i7-7500U CPU with 8GB of RAM. To compare the computation speed of the HMM with the CarHMM, we timed how long it took to fit a three state HMM and a three state CarHMM 100 times to the seal data in Section 6. We also timed how long it took to simulate and refit 100 simulations from each model. The HMM averaged 2.43 seconds per fit, and an additional 0.77 seconds per simulation. The CarHMM averaged 2.37 seconds per fit, and an additional 1.23 seconds per simulation. The difference in computation time between the two models is essentially negligible. Our implementation of the CarHMM uses the R package Template Model Builder (Kristensen et al.,, 2015), which allows for fast computation through automatic differentiation. It also has the ability to fix parameters at given values, allowing our HMM and CarHMM implementation to be identical. Our implementation and other functional tools discussed earlier are available as an R package at the first author’s GitHub page. This package also includes the data used in Section 6.
6 Best practice analysis of a male grey seal track
In this Section we demonstrate what we now consider to be basic best practice for analyzing animal movement data and reporting subsequent results. Plots, including residual plots and state estimate maps, are given in ESM A.2.
The data are a subset of a male grey seal track on the Scotian shelf, analyzed previously in Whoriskey et al., (2017). The seal was tracked using GPS with negligible observation error. Due to some data collection issues (the median time differences abruptly change without explanation) we will look at only the final 3,158 locations with time differences having a mean, median, and 3rd quartile of 100, 64, and 122 minutes, respectively.
First, one must choose values for the time step and group cutoff. To do this, set up a grid of values where the time step ranges from 60 minutes to 120 minutes by increments of 3 minutes, and the group cutoff ranges from the time step to twice the time step in increments representing 5% of the time step. This range of values for the time step is chosen to range approximately from the median to the 3rd quartile. Refer to Section 4.1 for more detail.
Both metrics for choosing a good time step and group cutoff discussed in Section 4.1 chose an optimal time step of 66 minutes and group cutoff of 132 minutes. The resulting interpolated track consists of 3,129 locations in 251 groups with and . The mean of the unstandardized step lengths is 2.10 kilometres per time step (1.91 km/hr).
The most useful plot of the data is the lag-plot of vs. and is shown as part of Figure 3. This plot shows the smeared texture that is characteristic of the CarHMM. The residuals for a two state CarHMM have autocorrelation on the border of significance. Neither a three state or a four state CarHMM give improved residuals (not shown). The data nor any of the residuals showed evidence of long-term seasonality. Given no other reason to choose a specific number of states, we recommend using the least number of states which you feel accurately describe the data.
We also remind the reader that behavioural state labels such as “foraging” and “transiting” may not be reflective of the actual biology.
Two state model
The parameter estimates are given in Table 7 in ESM A.1. State 2 is interpretable as a “transiting” behaviour. The autocorrelation parameter () and concentration parameter are suitably high, and the standard deviation is suitably low. A map of the state estimates also indicates a “transiting” behaviour.
State 1 does not have as clear an interpretation. It may be tempting to label it a “foraging” behaviour to complement the “transiting” behaviour of State 2, however the parameter estimates for State 1 do not fully support this view. The autocorrelation parameter ( is not close to 0 and the concentration parameter () is higher than expected. Further, a map of the state estimates shows that some of the behaviour picked up by this state does not have traditional “foraging” characteristics. This suggests that State 1 may be picking up two distinct behaviours. We believe these behaviours may be a “foraging” behaviour and a “large area search” behaviour, although many other possibilities may exist. For this reason, we suggest using a third state to further differentiate these behaviours.
Three state model
The parameter estimates are given in Table 4. State 1 is closer to a “foraging” behaviour than it was in the two state model, and a map of the state estimates places State 1 where we might a priori expect “foraging” to take place based solely on the locations. State 3 is archetypal “transiting” behaviour with both and close to 1. Based on the transition probabilities which do not allow transitions between State 1 and State 3, one would label State 2 a “transitional” behaviour. Based on parameter estimates and a map of the state estimates there is no reason to believe that a fourth state is needed.
The expected residency times are: 3.48 timesteps (3 hr 50 min) for the “foraging” behaviour; 4.93 timesteps (5 hr 25 min) for the “transitional” behaviour; and 8.33 timesteps (9 hr 10 min) for the “transiting” behaviour. The expected activity budget gives 26.4% of the seal’s time spent “foraging”, 22.8% of its time spent “transiting”, and 50.8% of its time transitioning between the two. A simulation study of 93 convergent simulations out of 100 gave first and third quartiles of the state estimate error as .
7 Discussion
We have introduced the conditionally autoregressive hidden Markov model (CarHMM) for highly accurate tracking data as an alternative to both the HMM originally developed in Morales et al., (2004) and the HMMM documented in Whoriskey et al., (2017).
Subjective choices are often involved during data processing and model fitting. When fitting discrete-time movement models, the choice of time step often depends on the discrete behaviour of interest as well as the observation frequency (Breed et al.,, 2012). We propose a statistic to help the user choose a time step based on producing a roughly similar number of interpolated locations and data points as the original tracking data set. This could be combined with the multi-scale model of Leos-Barajas et al., (2017) to study the discrete behaviour of interest. We have additionally proposed a method to deal with long periods of missing data. In some formulations of the HMM, a missing location enters the joint likelihood by including the contribution of the underlying behavioural state Markov chain ( in our formulation above) while removing the observation contribution for that location ( in our formulation) (Zucchini et al.,, 2016). We instead decided to split the track into multiple groups for compartmentalized model fitting, and offered metrics for choosing how to perform this partition. Frequent long periods of missing data can be common in marine environments.
The CarHMM draws a new link between HMMs and the DCRWS model of Jonsen et al., (2005). Within the marine context, the two most commonly sought-after behavioural states are foraging and transiting. These states are typically assumed to follow an area-restricted search pattern, whereby foraging patches are characterized by shorter step lengths occurring in diffuse directions, and are interspersed with periods of directed travel consisting of longer step lengths directed straight ahead (see e.g. Whoriskey et al., (2017)). While these states can be directly inferred from the state-dependent distributions of the HMM, the interpretation of these state estimates resulting from the DCRWS is less straightforward. Within the DCRWS, the main parameter influencing the step lengths is an autocorrelation term (). Usually (again see e.g. Whoriskey et al., (2017)), high values are interpreted as highly persistent movement (indicative of transiting) and low values constitute highly random movement (representing foraging). As a result, transiting and foraging are not necessarily delineated by longer and shorter step lengths. The CarHMM combines the two approaches such that we now have a clear interpretation of the step lengths but can still account for the fact that some animals will tend to move in a similar (or dissimilar) manner across time. These properties make the CarHMM a useful model for linking movement data to behavioural characteristics.
Acknowledgements
Appendix A Extra Supplementary Material
A.1 Simulation graphics and tables
A.2 Graphics for the male grey seal
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ailliot, (2006) Ailliot, P. (2006). Some theoretical results on markov-switching autoregressive models with gamma innovations. Comptes Rendus Mathematique , 343(4):271–274.
- 2Breed et al., (2012) Breed, G. A., Costa, D. P., Jonsen, I. D., Robinson, P. W., and Mills-Flemming, J. (2012). State-space methods for more completely capturing behavioral dynamics from animal tracks. Ecological Modelling , 235-236:49 – 58.
- 3Curry, (2014) Curry, D. M. (2014). An algorithm for clustering animals by species based upon daily movement. Procedia Computer Science , 36:629 – 636. Complex Adaptive Systems Philadelphia, PA November 3-5, 2014.
- 4Douc et al., (2004) Douc, R., Moulines, E., Rydén, T., et al. (2004). Asymptotic properties of the maximum likelihood estimator in autoregressive models with markov regime. The Annals of statistics , 32(5):2254–2304.
- 5Forester et al., (2007) Forester, J. D., Ives, A. R., Turner, M. G., Anderson, D. P., Fortin, D., Beyer, H. L., Smith, D. W., and Boyce, M. S. (2007). State-space models link elk movement patterns to landscape characteristics in Yellowstone national park. Ecological Monographs , 77(2):285–299.
- 6(6) Hooten, M., Johnson, D., Mc Clintock, B., and Morales, J. (2017 a). Animal Movement: Statistical Models for Telemetry Data . CRC Press.
- 7(7) Hooten, M. B., King, R., and Langrock, R. (2017 b). Guest editor’s introduction to the special issue on “animal movement modeling”. Journal of Agricultural, Biological and Environmental Statistics , 22(3):224–231.
- 8Hussey et al., (2015) Hussey, N. E., Kessel, S. T., Aarestrup, K., Cooke, S. J., Cowley, P. D., Fisk, A. T., Harcourt, R. G., Holland, K. N., Iverson, S. J., Kocik, J. F., Mills Flemming, J. E., and Whoriskey, F. G. (2015). Aquatic animal telemetry: A panoramic window into the underwater world. Science , 348(6240):1255642.
