Regularized estimation for highly multivariate log Gaussian Cox processes
Achmad Choiruddin, Francisco Cuevas-Pacheco, Jean-Fran\c{c}ois, Coeurjolly, Rasmus Waagepetersen

TL;DR
This paper introduces stable and efficient methods for estimating parameters in complex multivariate log Gaussian Cox processes, enabling better analysis of high-dimensional point pattern data, exemplified by tropical rainforest ecology.
Contribution
It develops novel numerical algorithms for parameter estimation and model selection in highly multivariate log Gaussian Cox processes, addressing computational challenges.
Findings
Algorithms are numerically stable and efficient.
Applied successfully to tropical rainforest data.
Improves modeling of complex multivariate point patterns.
Abstract
Statistical inference for highly multivariate point pattern data is challenging due to complex models with large numbers of parameters. In this paper, we develop numerically stable and efficient parameter estimation and model selection algorithms for a class of multivariate log Gaussian Cox processes. The methodology is applied to a highly multivariate point pattern data set from tropical rain forest ecology.
| Method | |||||
|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |
| Minimized objective function | |||||
| SQN | 6.61 | 4.76 | 5.39 | 6.32 | 4.51 |
| CBD | 3.55 | 1.96 | 1.73 | 1.62 | 1.57 |
| Timings (seconds) | |||||
| SQN | 0.96 | 1.98 | 3.97 | 6.45 | 8.99 |
| CBD | 1.99 | 3.11 | 4.26 | 5.30 | 5.92 |
| Method | Outliers (%) | |||||
| 1 | 2 | 3 | 4 | 5 | ||
| SQN | 0.41 | 0.93 | 1.10 | 1.17 | 1.09 | 10.3 |
| CBD | 0.41 | 0.25 | 0.29 | 0.32 | 0.39 | 0 |
| SQN | 0.58 | 0.54 | 0.44 | 0.89 | 0.98 | 1.1 |
| CBD | 0.34 | 0.18 | 0.28 | 0.39 | 0.50 | 0 |
| SQN | 0.0791 | 0.1752 | 0.1337 | 0.4091 | 0.4566 | 11.5 |
| CBD | 0.0050 | 0.0091 | 0.0110 | 0.0005 | 0.0004 | 0 |
| LSE | LASSO | LASSO | ||||||||||
| 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | |
| Min | 47 | 28 | 13 | 12 | 42 | 32 | 21 | 5 | 16 | 37 | 30 | 17 |
| 1-SE | 46 | 32 | 22 | 0 | 15 | 20 | 65 | 0 | 10 | 22 | 65 | 3 |
| LSE | LASSO | LASSO | ||||||
|---|---|---|---|---|---|---|---|---|
| Min | Min | Min | 1-SE | Min | 1-SE | Min | 1-SE | |
| 0.26 | 0.33 | 0.33 | 0.40 | 0.36 | 0.54 | 0.40 | 0.54 | |
| 0.42 | 0.54 | 0.54 | 0.58 | 0.56 | 0.75 | 0.63 | 0.76 | |
| 0.04 | 0.05 | 0.05 | 0.02 | 0.03 | 0.01 | 0.04 | 0.01 | |
| 0.28 | 0.31 | 0.32 | 0.35 | 0.33 | 0.41 | 0.37 | 0.42 | |
| LSE | LASSO | LASSO | |||||||||||||
| 0 | 1 | 2 | 3 | 4 | 0 | 1 | 2 | 3 | 4 | 0 | 1 | 2 | 3 | 4 | |
| Min | 19 | 21 | 18 | 19 | 23 | 14 | 31 | 20 | 19 | 16 | 6 | 15 | 20 | 21 | 38 |
| 1-SE | 27 | 36 | 20 | 12 | 5 | 22 | 37 | 21 | 8 | 12 | 21 | 22 | 25 | 11 | 21 |
| LSE | LASSO | (LASSO) | |||||
|---|---|---|---|---|---|---|---|
| Min | 1-SE | Min | 1-SE | Min | 1-SE | ||
| 0.50 | 0.67 | 0.44 | 0.48 | 0.78 | 0.51 | ||
| 0.58 | 0.89 | 0.54 | 0.70 | 0.88 | 0.76 | ||
| 0.02 | 0.02 | 0.01 | 0.02 | 0.02 | 0.02 | ||
| 0.35 | 0.35 | 0.34 | 0.39 | 0.35 | 0.40 | ||
| Lower | -1 | -0.5 | -0.2 | 0 | 0.2 | 0.5 |
|---|---|---|---|---|---|---|
| Upper | -0.5 | -0.2 | 0 | 0.2 | 0.5 | 1 |
| 2 | 6 | 9 | 13 | 22 | 48 | |
| 0 | 2 | 15 | 60 | 19 | 4 |
| Interval | 0-0.25 | 0.25-0.5 | 0.5-0.75 | 0.75-1 |
|---|---|---|---|---|
| Number of species | 46 | 20 | 10 | 10 |
| Species (%) | 53 | 23 | 12 | 12 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Regularized estimation for highly multivariate log Gaussian Cox processes
Achmad Choiruddin
Department of Mathematical Sciences, Aalborg University
Francisco Cuevas-Pacheco
Department of Mathematical Sciences, Aalborg University
Jean-François Coeurjolly
Department of Mathematics, Université du Québec à Montréal
Rasmus Waagepetersen
Department of Mathematical Sciences, Aalborg University
Abstract
Statistical inference for highly multivariate point pattern data is challenging due to complex models with large numbers of parameters. In this paper we develop numerically stable and efficient parameter estimation and model selection algorithms for a class of multivariate log Gaussian Cox processes. The methodology is applied to a highly multivariate point pattern data set from tropical rain forest ecology.
Key words: cross pair correlation, elastic net, LASSO, log Gaussian Cox process, multivariate point process, proximal Newton method.
1 Introduction
Highly multivariate point pattern data are becoming increasingly common. Tropical rain forest ecologists, for example, collect data on locations of thousands of trees belonging to hundreds of species. Likewise, huge space-time data sets regarding scene, time and type of crimes are recorded and made publicly available for many major cities across the world. Research on statistical methodology for multivariate point patterns has mainly considered bivariate or trivariate point patterns. Some exceptions are Diggle et al. (2005) and Baddeley et al. (2014) who considered four- and six-variate multivariate Poisson processes and more recently Jalilian et al. (2015) and Waagepetersen et al. (2016) who considered five- and nine-variate multivariate Cox processes. A truly high-dimensional analysis was conducted by Rajala et al. (2018) who introduced a multivariate Gibbs point process and applied it to a point pattern data set containing locations of 83 species of rain forest trees.
A particular challenge regarding modeling of highly multivariate point patterns is that models easily become very complex with large numbers of parameters. To enhance interpretability of fitted models and numerical stability of estimation, Rajala et al. (2018) used regularization methods such as the group lasso. The possibility of using regularization was also mentioned in the discussion of Waagepetersen et al. (2016) in the context of multivariate log Gaussian Cox processes.
The type of multivariate log Gaussian Cox process considered by Waagepetersen et al. (2016) and reviewed in Section 2 has a simple and natural interpretation and e.g. enables the user to decompose variation according to different sources and to group different types of point patterns according to similarities in their spatial distributions, see Waagepetersen et al. (2016) for details. However, the fitting of these models is very challenging in the highly multivariate case due to model complexity. In Section 3 of this paper, we develop a numerically stable and efficient parameter estimation methodology by introducing regularization and using efficient convex optimization algorithms. We test the methodology in a simulation study in Section 4 and apply it to a tropical rain forest data in Section 5. Section 6 contains some concluding remarks.
2 Multivariate log Gaussian Cox processes
A multivariate log Gaussian Cox point process (see Møller et al., 1998) is a multivariate point process , , where each component , , is a Cox process driven by a log Gaussian random intensity function . Conditionally on the , the are independent Poisson point processes each with intensity function . As in Waagepetersen et al. (2016), we assume that the random intensity functions are of the form with
[TABLE]
The terms are deterministic and typically given in terms of regressions on observed covariates. The terms and are zero-mean Gaussian fields. The can be mutually correlated while the are assumed to be independent. The are assumed to be stationary with variances and correlation functions , . For the we assume that
[TABLE]
where , is a real valued coefficient matrix, and the , , are independent zero-mean stationary Gaussian fields with variance one. In our applications we also consider the case meaning that the are omitted in (1). The can be interpreted as effects of unobserved spatial covariates while the represent sources of clustering which are specific to each type of points. We denote by the correlation function of . For the correlation functions and we introduce isotropic parametric models and , where and are correlation scale parameters. Specifically, we consider in this paper exponential correlation functions , , although many other choices are available (Chilès and Delfiner, 1999).
2.1 Intensity function and pair correlation function
Let denote the th row of . Following Møller et al. (1998), the intensity function of is \rho_{i}(\mathbf{u})=\exp\big{[}\mu_{i}(\mathbf{u})+\boldsymbol{\alpha}_{i\cdot}\boldsymbol{\alpha}^{{\mbox{\scriptsize\sf T}}}_{i\cdot}/2+\sigma^{2}_{i}/2\big{]} while the cross pair correlation function for the pair and is
[TABLE]
for . Consider two spatial locations and . Then represents the cross-Palm intensity function (Coeurjolly et al., 2017) and can be interpreted as the intensity function of conditional on that . Hence () implies that presence of a point from at increases (decreases) the intensity of at . Thus () implies repulsion (attraction) between points of and at lag . Similarly, a large value of leads to strong attraction among points of separated by a lag .
Non-parametric kernel estimates of the are given by
[TABLE]
where is the observation window, is a kernel function depending on a smoothing parameter , denotes area and denotes the translate of by the vector (Møller and Waagepetersen, 2003). The quantities and are estimates of the intensity functions of and , typically obtained from regression models depending on observed covariates through maximizing the composite likelihood (see e.g. Waagepetersen, 2007; Møller and Waagepetersen, 2007) or its regularized versions (e.g. Thurman et al., 2015; Choiruddin et al., 2018).
2.2 Least squares estimation
Let be the parameter vector consisting of the components of , , and . Let further
[TABLE]
The objective function used by Waagepetersen et al. (2016) for parameter estimation is of the form
[TABLE]
where
[TABLE]
, , are obtained using (3) for lags and the are non-negative weights. The matrix is () or () with rows () or (), , where
[TABLE]
Waagepetersen et al. (2016) minimized using a standard quasi-Newton method.
2.3 Inference regarding multivariate dependence structure
The model (1) enables us to decompose the covariances of the latent Gaussian fields into contributions from the common fields and the type-specific fields . Specifically, Waagepetersen et al. (2016) considered for each type and lag the proportion of variance (PV) due to the common fields:
[TABLE]
These are useful e.g. for grouping species based on how much of the variation is due to common factors respectively type-specific factors. Furthermore, from and we can compute the matrix of lag zero inter-type covariances due to the common latent fields with th entry
[TABLE]
as well as the lag zero covariances between the fields including both common and type-specific effects,
[TABLE]
A row informs on the dependence of on the common latent fields. Considering the norms of differences , we are able to group the different types of point patterns according to their dependence on the latent factors .
As discussed in Waagepetersen et al. (2016), the distribution of our multivariate log Gaussian Cox process is invariant to 1) simultaneous permutation of columns in and corresponding ’s and 2) multiplication of a column in by . Thus we can not identify individual parameters and without imposing constraints on the parameter space.
In our simulation studies in Section 4, we therefore follow Waagepetersen et al. (2016) by restricting attention to identifiable functions of and such as the aforementioned proportions of variances and covariances and norms of differences between rows of . In the application, we also consider the percentage of zero entries when is estimated using elastic net regularization with , see next section. The more zeros, the less complex is the dependence structure of the multivariate log Gaussian Cox process.
3 Regularized least squares estimation
The parameter vector is of potentially very high dimension, especially due to the many components of the parameter matrix . To enhance interpretability and numerical stability of estimation we suggest to introduce regularization and thus consider the regularized least squares criterion
[TABLE]
where is given by (5), is a nonnegative tuning parameter and is a convex penalty function. We consider in the following the elastic net penalization (Zou and Hastie, 2005) , , which embraces LASSO (Tibshirani, 1996) and ridge regression (Hoerl and Kennard, 1988) techniques by setting or respectively.
Using regularization in a related factor analysis was previously suggested by Choi et al. (2010). Their simpler setting corresponds to directly observing vectors , , where is modeled as in (1) but with zero spatial correlation. In contrast, our are unobserved with spatial correlation modeled via the correlation functions and . Thus the computational methodology suggested by Choi et al. (2010) is not applicable in our situation.
To minimize (7) with respect to , we employ a cyclical block descent algorithm where , , and are updated in turn. The updating is iterated until relative function convergence of the criterion (7). The details of the block updates are given in the following two sections and Appendices A-B. Pseudo-code for the full algorithm is given in Appendix B.3.
3.1 Update for and
Our strategy for updating and is to use for , a least squares update of followed by an update of using a cyclical coordinate descent algorithm. The motivation for updating rows instead of other subsets of is that the update of , keeping all other parameters fixed, is quite close to a standard least squares problem, as will be evident in the following.
The relevant part of the objective function for the updates of and given all other parameters is
[TABLE]
where the th column of is the th column of multiplied by . In other words, for , where is the diagonal matrix with diagonal entries . For ease of notation we here omit the dependence of and on the fixed parameters and . Note that (8) is equivalent to a standard least squares objective function for except for the middle term that depends on , , cf. (4).
The minimization of with respect to only involves the middle term in (8). This is a standard least squares problem except that we require to be non-negative. Thus,
[TABLE]
An explicit formula for this update is given in Appendix B.1.
To update (given and all other parameters), we use a so-called proximal Newton update (Lee et al., 2014, and Appendix A) where the middle term in (8) is replaced by a quadratic approximation around the current value . We denote by the resulting approximate objective function (to be detailed in the next paragraph). Since is a regularized linear least squares objective function, minimization can be performed using a standard coordinate descent algorithm (see e.g. Hastie et al., 2015).
A very simple quadratic approximation of the middle term of (8) is
[TABLE]
where \tilde{X}_{ii}^{k}=X_{ii}\text{Diag}\big{\{}\alpha_{i1}^{(k)},\ldots,\alpha_{iq}^{(k)},1\big{\}}. Nevertheless, the curvature of this quadratic approximation does not match the curvature of the original term at . Instead we use a second-order Taylor approximation as detailed in the Appendix A.1 which results in the explicit expression for given by
[TABLE]
where
[TABLE]
and denotes the first columns in .
We obtain
[TABLE]
using coordinate descent with an explicit formula for the updates given in Appendix B.2. Further, define for some ,
[TABLE]
Thus, is obtained using as a search direction with step size controlled by . Following Lee et al. (2014, Proposition 2.3), one can show (see Appendix A.2) that if is small enough. That is, if the minimization of is combined with a line search the resulting update is guaranteed to decrease the objective function written in (8).
3.2 Update for and
To update and given all other parameters, we first reparameterize the objective function in terms of and . We then update and in turn using a standard quasi-Newton update as implemented in the optim routine in the R language with method bfgs (Broyden-Fletcher-Goldfarb-Shanno update). Finally, we transform back using the exponential to get updates of and .
We also tried other options: joint update of without log-transformation but introducing box constraints to avoid negative values and joint quasi-Newton update of the log-transformed parameters . For simulated data examples, the option with separate updates of and performed best.
3.3 Initialization
We initialize the components by a sample of independent random normals with mean zero and standard deviation 0.05 while we choose 1 for the initial values of the components in . For and we choose initial values that depend on the scale of the observation window to avoid that the corresponding covariance functions become essentially constant equal to zero (too small initial values) or to one (too large initial values). For the unit square observation window, for example, the initial values for and were chosen randomly from the uniform distribution on . Regarding the choice of weights introduced in Section 2.2, we follow arguments by Waagepetersen et al. (2016) and fix, for and , for and .
3.4 Strategy to determine and regularization parameters and
In our applications we consider just a few values (ridge), (mix of ridge and LASSO, i.e. elastic net) and (LASSO). For each of the values of we use a two-dimensional -fold cross validation approach to select optimal values and among prespecified values and (e.g. Hastie et al., 2013, Chapter 7). The procedure is as follows.
- Step 1.
We split indices ( and ) into sets (see details below). 2. Step 2.
For each and , we obtain an estimate by minimizing equation (7) with replaced by 0 for . The cross validation score for and is then obtained by
[TABLE]
where and . 3. Step 3.
To obtain and , we minimize w.r.t and , i.e.,
[TABLE]
The sets in Step 1 need to be chosen carefully. First, since and are strongly correlated when and are close, we leave out blocks of consecutive indices. Second, we do not include diagonal indices in the sets since values include contributions from the type-specific random fields. The diagonal values thus do not provide so much information about and omission of these values further makes the estimation procedure less stable regarding and . So, to determine each subset , we arrange the with lexicographically in a vector and split this vector into consecutive blocks of length . These blocks are then assigned to the different at random.
The one standard error (1-SE) rule is an alternative way to select and based on the CV scores obtained from (12) (e.g. Hastie et al., 2013). In case of fixed, the 1-SE rule chooses the largest for which the CV score is less than the smallest CV score plus one standard deviation. In the case where both and is to be selected, we adapt the 1-SE rule by starting with given by (13) and then choosing to be the smallest and largest possible such that the following condition holds:
[TABLE]
where
[TABLE]
Hence, the 1-SE rule attempts to select the most simple model whose CV score is within one standard error of the minimal CV score.
Finally, note that when or and is chosen, the resulting estimate of may contain columns that consist entirely of zeros. The effective number of columns in then becomes smaller than .
4 Simulation study
We conduct two simulation studies to evaluate the regularized least squares technique for parameter estimation and the cross-validation (CV) method to select and . The setting of the first study corresponds to the simulation study in Waagepetersen et al. (2016). We first compare the estimates obtained using the new cyclical block descent (CBD) algorithm developed in Section 3 with the method proposed by Waagepetersen et al. (2016). In this regard, we consider values of and for comparison purposes, we fix since regularization was not used in Waagepetersen et al. (2016). Next we consider only the new algorithm with the objective of comparing different CV options for selecting and , cf. Section 3.4, and to study the effect of regularization. The second study has the same objective but with a more complex setting for the simulations. In both simulation studies we use for the CV and we only consider the LASSO option () for regularization.
To asses the parameter estimates, we consider the root mean squared errors (RMSEs) of the estimates. For a real parameter and estimate , the RMSE is
[TABLE]
For each of the parameter matrices/vectors , , , or the vector of proportions of variances at lag 0 (PV), we evaluate the average of RMSEs for the components in these quantities. For example, we compute the average of RMSEs for each entry in the matrix .
4.1 Comparison of methods for least squares estimation
The first study follows the one in Waagepetersen et al. (2016) for which 200 point patterns in are generated from multivariate log Gaussian Cox processes as defined in Section 2, with and . The true parameters are: and
[TABLE]
The trend models are set such that the expected number of points is 1000 for each . A uniform kernel with bandwidth 0.005 is used for the non-parametric estimation of the cross pair correlation function at equispaced lags between 0.025 and 0.25.
For each simulation we compare two methods for minimizing (7) with and :
The standard quasi-newton (SQN) optimization algorithm considered by Waagepetersen et al. (2016) and implemented in the R package optimx. This algorithm updates all parameters jointly. 2. 2.
The new CBD algorithm described in Section 3.
The comparison is in terms of minimization of the objective function, computing time and RMSEs.
Table 1 reports the averages of the values of the minimized objective functions and the computational times over the 200 simulations. All timings are carried out on a Dell R740 2 x 14 cores (Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz) 768 GB RAM 2x200gb SSD 960 GB NVME. CBD performs considerably better in terms of minimizing the objective function than SQN. SQN is somewhat faster than CBD for small but slower for larger . The computing times for SQN grow quite quickly with increasing while the computing times seems more stable for CBD.
The RMSE results are shown in Table 2. For the calculation of the RMSEs, we exclude small percentages of very extreme parameter estimates. These percentages are reported in the last column of Table 2. CBD performs better than SQN since smaller RMSEs are obtained and there are no outlying parameter estimates. For SQN quite large percentages of extreme parameter estimates are observed.
4.2 Assessment of cross-validation and regularization methods with
In this section we continue with the simulations from the previous setting but restrict attention to CV selection of and using CBD for optimization with the LASSO regularization (). We select values of in and values of in which has 20 elements and where the non-zero values of grow log-linearly from to . We consider three situations: (1) we select from with fixed, thus least squares estimation (LSE) is performed; (2) we search for the jointly optimal ; (3) we fix and select from . Recall that the selection of a relatively big may lead to zero columns in the estimate. We therefore consider the effective as defined in Section 3.4. Thereby we can also evaluate the selection of in situation (3). In case of (2) we both consider the minimum CV (Min) and the 1-standard error (1-SE) rules to select and .
Table 3 shows the distribution of absolute distance between and the true . For LSE, using the Min rule, coincides with the true for 47% of the simulations and differs at most by 1 from the true in 75% of the simulations. The results with the 1-SE rule are similar with percentages and . LASSO with Min rule for joint selection of performs similarly to LSE with the corresponding percentages 42 and 74 %. With fixed the percentages are reduced to 16% and 53 %. Using 1-SE rule, the LASSO forces many columns to be zero leading to quite small percentages where .
RMSEs are reported in Table 4 for all three situations. In addition, in the first columns, we consider the case fixed assuming the true is known. We first note that LASSO gives worse results than LSE when is fixed. In general, for unknown , LSE and LASSO perform quite similarly when the Min rule is used. The results are worse when 1-SE is used and in particular for LASSO. When is fixed to and only is selected the results are worse than for LASSO with selected by the Min rule while the results with are similar to LASSO with selected by the 1-SE rule.
The overall impression is that LSE performs slightly better than LASSO, especially in estimating . This may indicate that when is relatively small, selection of with (LSE) already gives sparse results. Another reason that LASSO does not improve RMSE may be that the true is not that sparse having only 40% zero components. Thus the bias introduced by regularization is not counterbalanced by a reduction in variance. Also, the 1-SE rule does not seem preferable in this situation. In the next section we consider a more complex setting with .
4.3 Assessment of cross-validation and regularization methods with
In this experiment, we study a more complex situation with a higher and more variation in the parameters. We simulate 200 point patterns from a multivariate log Gaussian Cox process with , , , and parameters
[TABLE]
[TABLE]
and equal to
[TABLE]
The settings for the trend models, the kernel estimation and the cross validation are as in the previous simulation study except that . In , 40% of the components are zeros and 20% are of absolute value less than 0.15. The remaining components have absolute value greater than 0.7.
Table 5 shows the distribution of the absolute distance between and the true . Considering first the Min rule, with LSE, concurs with the true in 19% of the simulations and differs at most by 2 from the true in 58% of the simulations. The corresponding percentages are 14% and 65 % for LASSO, and 6% and 41 % for LASSO with fixed. In this situation, the 1-SE rule seems advantageous for selecting . For example, the percentage of ’s which differ from the true by at most 2 improves from 58% to 83 % for LSE, from 65% to 80 % for LASSO, and from 41% to 68 % for LASSO with fixed .
Table 6 details the RMSE results. The superiority of the 1-SE rule when selecting does not translate into better results in terms of RMSE except for LASSO with fixed where better results are obtained with 1-SE than with Min. The best results are obtained with LASSO using the Min rule for selecting and . This indicates that regularization is indeed helpful in complex settings with relatively large .
Based on the simulation studies, for analyzing highly multivariate point pattern data, we recommend to use regularization with the Min rule for selecting and .
5 Application
In a 50-hectare region of the tropical moist forest of Barro Colorado Island (BCI) in central Panama, censuses have been carried out where all free-standing woody stems with at least 10 mm diameter at breast height were identified, tagged, and mapped, resulting in maps of over 350,000 individual trees with around 300 species (see e.g. Hubbell and Foster, 1983; Condit et al., 1996; Condit, 1998). In addition, 13 spatial covariates are also available containing topological attributes and soil nutrients (see Figure 5). Our main objective is to study the impact of regularization and the computational feasibility of our method. We first consider 9 tree species, Psychotria, Protium t., Capparis, Protium p., Swartzia, Hirtella, Tetragastris, Garcinia, Mourmiri, with intermediate abundances ranging from 2500 to 7500 and previously analyzed by Waagepetersen et al. (2016). The plots of locations of each species are shown in Figure 6. The main aim of this analysis is to compare the results with our new algorithm to those obtained by Waagepetersen et al. (2016). Secondly, to test our algorithm in a more challenging situation, we analyze a highly multivariate point pattern involving species of trees with at least 400 individuals, resulting in 86 species.
For each species, we use maximum composite likelihood to fit log-linear regression models involving the spatial covariates for the -terms in (1). We then estimate the cross pair correlation function using (3). Therefore, the variation due to observed covariates are filtered out and the non-parametric estimates of cross pair correlation function hence capture the residual correlation due to unobserved covariates, species-specific factors, and any other sources.
5.1 Application with 9 species
For each value of we apply -fold CV to select and where as in the simulation studies and . The upper left plot in Figure 1 shows for each , as a function of . For comparison with Waagepetersen et al. (2016) we also show in this plot against (LSE). A general pattern for ridge, elastic net and LASSO is that the cross validation scores decrease quite quickly as a function of until around and after that the CV scores stabilize or decrease slowly. The CV scores for ridge () are consistently smaller than those for elastic net () and LASSO (). Hence we select . The minimal CV score for is obtained with . However, in the interest of model simplicity, we choose and since the decrease in CV score is rather minor from to .
For comparison, the minimal CV score with LASSO is obtained with and . However, in this case, the resulting effectively selected is three since the resulting estimate of has 5 zero columns. In case of LSE (), the CV procedure chooses . The second-smallest CV with LSE is obtained with which was the value chosen in Waagepetersen et al. (2016). The difference in cross validation results for LSE compared with Waagepetersen et al. (2016) is due to our new more efficient optimization algorithm, cf. the comparison in Section 4.1.
The middle plot in Figure 1 is an image plot of the CV scores for ridge () where darker color corresponds to smaller CV score. The development of the CV scores across values of for fixed appears quite erratic with several local minima. In contrast, for each there appears to be a well-defined minimum for . As an example, the right plot in Figure 1 shows plotted against (where we replace the undefined by ). The computing time required to run the CV method with is hours with the same processor as used in the simulation study. Approximately seconds is required to estimate the parameters for the 9-species application using ridge with and .
The results regarding the multivariate dependence structure of the 9 species are qualitatively similar to those obtained by Waagepetersen et al. (2016). The estimated inter-species correlations , cf. (6), are shown in the left plot of Figure 2. Most of the pairs of species have a positive correlation. However, the correlations between Psychotria and the other species are mainly close to zero. The right plot in Figure 2 shows a hierarchical clustering of the species based on the estimated coefficient rows , where Psychotria appears to form its own cluster in agreement with the estimated inter-species correlations. This clustering may have some relation to the families of species as shown by the cluster of Protium p., Protium t. and Tetragastris which come from the same family (see Table LABEL:supp-tab:86spec in the supplementary material).
5.2 Application with 86 tree species
For the 86-species application, we apply the 8-fold CV procedure with and as in the previous section and . Figure 3 is similar to Figure 1. The left plot shows that consistently smaller CV scores are obtained with elastic net () and the smallest CV score is obtained with . The remaining plots are obtained with . The image plot of cross validation scores in the middle plot looks much smoother than in the 9 species case. The right plot shows a well defined minimum for given .
The computing time for the CV is 7.6 hours for and the computing time to estimate the parameters for the chosen and is 3.2 minutes. Out of parameters in the estimated , 13 were set to zero by the elastic net regularization. We thereby model distinct pair and cross pair correlation functions using only parameters. Thus we have indeed obtained a sparse model for the given data.
The distribution of estimated PVs is shown in Table 8. Most species () have estimated proportions of variances due to common factors less than 0.25.
Table 7 shows the distribution of estimated inter-species correlations due to common latent fields and the combination of common and species-specific fields (see Section 2.3) across 6 intervals. Most estimated correlations are positive. However, the correlations decrease a lot in absolute value when the species-specific fields are included (last row of Table 7).
Figure 4 shows a clustering of species based on estimated , . The leaves are marked with species life form. There may be some indication that species of life form “Tree” (life form number 4) tend to cluster together. However, one should be careful with this interpretation since apparent patterns like this could be due to sampling variation.
6 Conclusion
We developed in this study a regularized estimation method for highly multivariate point patterns modeled by multivariate log Gaussian Cox processes. The procedure is numerically stable and performs well both in the considered simulations and applications. In our truly highly multivariate second application, we were able to fit a sparse model for a multivariate point pattern with 86 types of points.
An interesting application of obtained estimates is to group types of points according to their estimated dependence on common latent fields as expressed by the rows . Hence a further development could be to consider an extension of the so-called fused LASSO (Tibshirani et al., 2005) by introducing regularization for differences . A further possibility would be to consider a sparse group LASSO (Simon et al., 2013) to obtain estimates of with some zeros of as developed in this paper and, in addition, with entire rows of zeros implying independence of corresponding types of points and all other types of points.
Acknowledgements The research by A. Choiruddin, F. Cuevas-Pacheco, and R. Waagepetersen is supported by The Danish Council for Independent Research — Natural Sciences, grant DFF – 7014-00074 ”Statistics for point processes in space and beyond”, and by the ”Centre for Stochastic Geometry and Advanced Bioimaging”, funded by grant 8721 from the Villum Foundation.
The BCI forest dynamics research project was made possible by National Science Foundation grants to Stephen P. Hubbell: DEB-0640386, DEB-0425651, DEB-0346488, DEB-0129874, DEB-00753102, DEB-9909347, DEB-9615226, DEB-9615226, DEB-9405933, DEB-9221033, DEB-9100058, DEB-8906869, DEB-8605042, DEB-8206-992, DEB-7922197, support from the Center for Tropical Forest Science, the Smithsonian Tropical Research k+1 Institute, the John D. and Catherine T. MacArthur Foundation, the Mellon Foundation, the Celera Foundation, and numerous private individuals, and through the hard work of over 100 people from 10 countries over the past two decades. The plot project is part of the Center for Tropical Forest Science, a global network of large-scale demographic tree plots.
The BCI soils data set were collected and analyzed by J. Dalling, R. John, K. Harms, R. Stallard and J. Yavitt with support from NSF DEB021104, 021115, 0212284, 0212818 and OISE 0314581, STRI and CTFS. Paolo Segre and Juan Di Trani provided assistance in the field. The covariates dem, grad, mrvbf, solar and twi were computed in SAGA GIS by Tomislav Hengl (http://spatial-analyst.net/). We thank Dr. Joseph Wright for sharing data on dispersal modes and life forms for the BCI tree species.
Appendix A Proximal Newton Method
Suppose we want to find the solution of
[TABLE]
where the function can be separated into two parts: the function which is a convex and twice continuously differentiable loss function and the function which is a convex but not necessarily differentiable penalty function. The proximal-Newton method is an iterative optimization algorithm that uses a quadratic approximation of the differentiable part :
[TABLE]
where is the current value of , is the first derivative of and is an approximation to the Hessian matrix . Letting , the next value of is obtained as
[TABLE]
for some . That is, is used to construct a search direction for the th value of . Theoretical results in Lee et al. [2014] show that can be chosen so that . The matrix can be chosen in various ways, see Lee et al. [2014] and Hastie et al. [2015] for more details.
In the following sections, we adapt the proximal Newton method to minimization of our objective function.
A.1 Quadratic approximation for updating
Let us first regard (8) as a function of ,
[TABLE]
To minimize (8), we consider the proximal Newton method stated in (15). In particular, we approximate by a quadratic approximation around the current value :
[TABLE]
Here, the first derivative is
[TABLE]
while is an approximation of the second derivative,
[TABLE]
where , denotes the first columns in , and C(\boldsymbol{\alpha}^{(k)}_{i\cdot})=4\text{Diag}\bigg{(}X_{ii,\cdot(1:q)}^{\mbox{\scriptsize\sf T}}\Big{(}Y_{ii}-X_{ii}\boldsymbol{\beta}_{ii}(\boldsymbol{\alpha}^{(k)},\boldsymbol{\sigma}^{2})\Big{)}\bigg{)}. Specifically,
[TABLE]
To ease the presentation and computation, we write from (17) in the form of a least squares problem
[TABLE]
where
[TABLE]
Replacing in (16) with we obtain the approximate objective function given in (9). Since (9) is a standard regularized least squares problem, we minimize (9) using a coordinate descent algorithm to obtain as detailed in Section B.2.
A.2 Theoretical result regarding proximal Newton update
Let where is the minimizer of (9) and according to a line search strategy let
[TABLE]
for some . Following the proof of Proposition 2.3 in Lee et al. [2014], we can verify the following theorem.
Theorem 1
Let . Then
[TABLE]
Thus, by Theorem 1, if is positive definite, we can choose so that . That is, the update of results in a decrease of the objective function (8).
Appendix B Algorithm
In our block descent algorithm, we minimize (7) with respect to , and in turn. For , we first update by minimizing (8) using least squares estimation followed by an update of by minimizing (9) using a coordinate descent method. We denote by the th column of for () or (). We detail, respectively in Appendices B.1 and B.2, the updates of and the coordinate descent updates of for . A summary of the final algorithm is given by Appendix B.3.
B.1 Update of
The parameter is updated using least squares methods. More precisely, the gradient of (8) with respect to is
[TABLE]
By solving , we obtain the update
[TABLE]
where is used to avoid negative results of the update.
B.2 Update of
Let , where and are specified in (10). Then we rewrite (9) as
[TABLE]
The gradient with respect to is
[TABLE]
Following the main argument by Friedman et al. [2010], the coordinate-wise update for is of the form
[TABLE]
where .
B.3 Algorithm to update
For a given and sequence of values , the overall procedure to estimate the parameters: is described by Algorithm 1. Note that estimates obtained with are used as initial values for the estimation with , .
Appendix C Plots and detail information of BCI data used in the analysis
Plots of 13 spatial covariates used for analysis are depicted in Figure 5. Figure 6 shows locations of the 9 selected tree species.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Baddeley et al. [2014] Adrian Baddeley, Aruna Jammalamadaka, and Gopalan Nair. Multitype point process analysis of spines on the dendrite network of a neuron. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 63(5):673–694, 2014.
- 2Chilès and Delfiner [1999] Jean-Paul Chilès and Pierre Delfiner. Geostatistics: modeling spatial uncertainty . Probability and Statistics. Wiley, New York, 1999.
- 3Choi et al. [2010] Jang Choi, Gary Oehlert, and Hui Zou. A penalized maximum likelihood approach to sparse factor analysis. Statistics and its Interface , 3(4):429–436, 2010.
- 4Choiruddin et al. [2018] Achmad Choiruddin, Jean-François Coeurjolly, and Frédérique Letué. Convex and non-convex regularization methods for spatial point processes intensity estimation. Electronic Journal of Statistics , 12(1):1210–1255, 2018.
- 5Coeurjolly et al. [2017] Jean-François Coeurjolly, Jesper Møller, and Rasmus Waagepetersen. A tutorial on Palm distributions for spatial point processes. International Statistical Review , 85(3):404–420, 2017.
- 6Condit [1998] R. Condit. Tropical Forest Census Plots . Springer-Verlag and R. G. Landes Company, Berlin, Germany and Georgetown, Texas, 1998.
- 7Condit et al. [1996] Richard Condit, Stephen P Hubbell, and Robin B Foster. Changes in tree species abundance in a neotropical forest: impact of climate change. Journal of tropical ecology , 12(2):231–256, 1996.
- 8Diggle et al. [2005] Peter Diggle, Pingping Zheng, and Peter Durr. Nonparametric estimation of spatial segregation in a multivariate point process: bovine tuberculosis in Cornwall, UK. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 54(3):645–658, 2005. ISSN 1467-9876. doi: 10.1111/j.1467-9876.2005.05373.x. URL http://dx.doi.org/10.1111/j.1467-9876.2005.05373.x .
