Component-based regularisation of multivariate generalised linear mixed models
Jocelyn Chauvet (IMAG), Catherine Trottier (IMAG), Xavier Bry (IMAG)

TL;DR
This paper introduces a component-based regularisation method for multivariate GLMMs that effectively handles high-dimensional, redundant explanatory variables by constructing orthogonal components, improving model estimation for grouped data.
Contribution
It proposes an extension of SCGLR for regularising multivariate GLMMs, combining component construction with an adapted estimation algorithm, tested on simulated and real data.
Findings
The method outperforms ridge and LASSO regularisations in grouped data scenarios.
Orthogonal components effectively capture relevant structural information.
The approach is validated on both simulated and real datasets.
Abstract
We address the component-based regularisation of a multivariate Generalised Linear Mixed Model (GLMM) in the framework of grouped data. A set Y of random responses is modelled with a multivariate GLMM, based on a set X of explanatory variables, a set A of additional explanatory variables, and random effects to introduce the within-group dependence of observations. Variables in X are assumed many and redundant so that regression demands regularisation. This is not the case for A, which contains few and selected variables. Regularisation is performed building an appropriate number of orthogonal components that both contribute to model Y and capture relevant structural information in X. To estimate the model, we propose to maximise a criterion specific to the Supervised Component-based Generalised Linear Regression (SCGLR) within an adaptation of Schall's algorithm. This extension of SCGLRâŚ
| GLMMâLASSO | LMMâridge | mixedâSCGLR | ||
| shrinkage parameter | shrinkage parameter | number of components | tradeâoff parameter | |
| 65 | 24 | 15 | 0.50 | |
| 92 | 54 | 5 | 0.58 | |
| 124 | 73 | 3 | 0.70 | |
| 163 | 78 | 3 | 0.73 | |
| 175 | 85 | 2 | 0.80 | |
| LMM | GLMMâLASSO | LMMâridge | mixedâSCGLR | |
|---|---|---|---|---|
| (no regularisation) | ||||
| GLMM | GLMMâLASSO | mixedâSCGLR | ||||
|---|---|---|---|---|---|---|
| (no regularisation) | ||||||
| Bernoulli | Poisson | Bernoulli | Poisson | Bernoulli | Poisson | |
| SCGLR | ||||||||
|---|---|---|---|---|---|---|---|---|
| mixedâSCGLR |
| GLMM | GLMMâLASSO | mixedâSCGLR | |||
|---|---|---|---|---|---|
| (no regularisation) | |||||
| Binomial | Poisson | Poisson | Binomial | Poisson | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Componentâbased regularisation of multivariate generalised linear mixed models
Jocelyn Chauvet
Catherine Trottier 11footnotemark: 1
and
Xavier Bry11footnotemark: 1 IMAG, Univ Montpellier, CNRS, Montpellier, France.
[email protected] ; [email protected]Univ PaulâValĂŠry Montpellier 3, Montpellier, France.
Abstract
We address the componentâbased regularisation of a multivariate Generalised Linear Mixed Model (GLMM) in the framework of grouped data. A set of random responses is modelled with a multivariate GLMM, based on a set of explanatory variables, a set of additional explanatory variables, and random effects to introduce the withinâgroup dependence of observations. Variables in are assumed many and redundant so that regression demands regularisation. This is not the case for , which contains few and selected variables. Regularisation is performed building an appropriate number of orthogonal components that both contribute to model and capture relevant structural information in . To estimate the model, we propose to maximise a criterion specific to the Supervised Componentâbased Generalised Linear Regression (SCGLR) within an adaptation of Schallâs algorithm. This extension of SCGLR is tested on both simulated and real grouped data, and compared to ridge and LASSO regularisations. Supplementary material for this article is available online.
Keywords: generalised linear regression, supervised components, random effects, structural relevance.
1 Introduction
In the framework of regression models on a large number of explanatory variables with redundancies and collinearities, the search for a reduced number of relevant dimensions to model responses has been an ongoing research over the last decades. In particular, the case where the explanatory variables outnumber the observations tends to be a new standard. Generalised Linear Models (GLMs) are the most widely used regression models, because they are easy to interpret and address a very large scope of applications with a variety of response distributions. For instance, Epidemiology, Biology and Social Sciences need to model binary outcomes, count data and survival times. All these fields often have to deal with both grouped data and multivariate responses combining variables of different types (e.g. one binary and another Poisson). In this work, we particularly aim at modelling abundances of several tree genera on plots of land grouped into forest concessions, using multiple redundant explanatory variables.
As far as dimensionâreduction is concerned, two main approaches have been developed. The first one is variableâselection, whereas the second one builds components, i.e. linear combinations of the explanatory variables, which synthesise the useful part of their information. As far as variableâselection is concerned, the most popular method is currently the LASSO, introduced by Tibshirani, (1996), which combines the likelihood with a penalty based on the ânorm of the coefficient vector. LASSO is one of the penaltyâbased regularisation methods, as are also ridge (Hoerl and Kennard,, 1970) and elasticânet (Zou and Hastie,, 2005). This LASSO selection approach has proved efficient to explain the phenomenon of interest when some of the explanatory variables are the âtrueâ ones, surrounded by a high number of irrelevant others. Nevertheless, it may be very unstable and helpless when the true explanatory dimensions are latent and indirectly measured through highly correlated proxies. This is where the componentâbased approach turns out to be useful. Bry et al., (2013) have developed a new methodology named Supervised Componentâbased Generalised Linear Regression (SCGLR), later extended and refined in Bry et al., (2014, 2016, 2018). As in any PLSâtype method, the construction of components in SCGLR is guided both by the correlationâstructure of variables in the explanatory space and by the prediction quality of the responses. Nevertheless, unlike PLS, SCGLR involves a general and flexible criterion allowing to specify the type of structure components are wanted to align with in the explanatory space (e.g. variable bundles, principal components, other subspaces). Moreover, SCGLR searches for explanatory directions common to multiple responses with probability distributions in the exponential family, each response being entitled to their own distribution. The current SCGLR method is implemented in the R package SCGLR (Cornu et al.,, 2018) available at https://CRAN.R-project.org/package=SCGLR and https://github.com/SCnext/SCGLR.
In the present work, we aim at modelling responses with a repeated or grouped design. For this purpose, the use of mixed models with random effects is widespread. Research on varianceâcomponent estimation in Generalised Linear Mixed Models (GLMMs) has been very active since the 1980s. For the most general distribution assumptions in such models, parameter estimation faces the intractability of the likelihood expressed as an integral with respect to the random effects. Several numerical approximations of the integral have been proposed: Gaussian quadrature (Anderson and Aitkin,, 1985) or adaptive versions of it (Pinheiro and Bates,, 1995), Laplace approximation leading to the definition of the penalised quasiâlikelihood (Breslow and Clayton,, 1993) or modified versions of it (Shun and McCullagh,, 1995). An alternative to this type of analytic approximation is a stochastic approximation of the integral calculation via MCMC techniques. In this approach, Zeger and Karim, (1991) described an approximate Gibbs sampling for GLMMs, which was extended by Clayton, (1996) to more general MetropolisâHastings algorithms. In parallel, McCulloch, (1997) developed the Monte Carlo EM algorithm where the expectation is computed numerically through a Monte Carlo approximation, after generating random effects with a MetropolisâHastings sampler. Mention can also be made of the recent work by Knudson, (2016): her strategy is to approximate the entire likelihood function using random effects simulated from a parametrised importance sampling distribution depending on the data. Unfortunately, these different approaches are not necessarily suitable for the same types of random effect designs (oneâdimensional random effect, embedded random effects, etc). In the wake of the first type of approximations, we here adopt the âJointâMaximisationâ strategy (McCulloch,, 1997), as introduced for instance by Schall, (1991). The model is iteratively linearised conditional on the random effects and variance components are then estimated using adapted linear mixed models methods. This strategy can be used for any random effect design and is less computationally intensive than Monte Carlo methods. Moreover, it provides us with a linear setting more suitable for the computation of components. Once the components calculated, model parameters can be estimated using any of the aforementioned strategies (see Section 7).
Modelling grouped responses through a GLMM with a large number of explanatory variables is the focus of this paper. The need for dimensionâreduction and regularisation has to accommodate the presence of random effects in the model, but our main purpose still remains to investigate the explanatory structure and link it to interpretable dimensions. For Gaussian responses, Eliot et al., (2011) proposed to extend the ridge regression to Linear Mixed Models (LMMs). Based on a penalised complete logâlikelihood, the adaptation of the ExpectationâMaximisation algorithm they suggest includes a new step to find the best shrinkage parameter using a generalised crossâvalidation scheme at each iteration. More recently, Schelldorfer et al., (2014) â and also Groll and Tutz, (2014) â proposed an âpenalised algorithm for fitting a highâdimensional GLMM, using Laplace approximation and an efficient coordinate gradient descent. In this work, we combine Schallâs iterative model linearisation with regularisation at each step. However, we do not use a penalty on the coefficient vectorâs norm â as proposed by Zhang et al., (2017) within the framework of multivariate count data. We rather propose to combine dimensionâreduction and predictorâregularisation using supervised components aligning on the most predictive and interpretable directions in the explanatory space.
The paper is organised as follows. In Section 2, we formalise the model and set the main notations used throughout the paper. In Section 3, we present the key features of SCGLR. Section 4 designs an extension of this methodology to mixed models, and particularly to grouped data. In Section 5, our extended method âmixedâSCGLRâ is evaluated on simulations and compared to ridgeâ and LASSOâbased regularisations. Finally, in order to highlight the power of mixedâSCGLR in terms of model interpretation, Section 6 presents an application to real data in the Poisson case.
2 Model definition and notations
In the framework of a multivariate GLMM, we consider responseâvectors forming matrix , to be explained by two categories of explanatory variables. The first category consists of few weakly correlated variables \boldsymbol{A}_{n\times r}=\big{[}\,\boldsymbol{a_{1}}\mid\ldots\mid\boldsymbol{a_{r}}\,\big{]}. These variables are assumed to be interesting per se and their marginal effects need to be precisely quantified. The second category consists of abundant and highly correlated variables \boldsymbol{X}_{n\times p}=\big{[}\,\boldsymbol{x_{1}}\mid\ldots\mid\boldsymbol{x_{p}}\,\big{]} considered as proxies to latent dimensions which must be found and interpreted. Since explanatory variables in are few, nonâredundant and of interest, they are kept as such in the model. By contrast, may contain several unknown structurally relevant dimensions important to model and predict , how many we do not know. is thus to be searched for an appropriate number of orthogonal components that both capture relevant structural information in and contribute to model .
This work addresses grouped data: the observations form groups. Within each group, observations are not assumed independent. For each response , a âlevel random effect is used to model the dependence of observations within each group. Hence, each is modelled with a GLMM assuming a conditional distribution from the exponential family.
Notations and conventions
All variables (namely the âs, âs and âs) will be identified with âvectors.
We will use bold lowercase letters for vectors (e.g. ) and bold capital letters for matrices (e.g. ).
being any matrix, denotes the transpose of .
denotes the identity matrix of size .
denotes the allâones vector of size .
Let and be nonâzero vectors in and let be a symmetric positive definite matrix of size . Then refers to the Euclidean scalar product of and with respect to metric . The cosine of the angle between and with respect to is given by , where .
The space spanned by vectors is denoted by . being any matrix, refers to the space spanned by the columnâvectors of .
Let be endowed with metric and let be a matrix of size . Then refers to the âorthogonal projector onto . Let be a vector in . The cosine of the angle between and with respect to is defined by .
3 SCGLR with additional explanatory variables
In this section we consider the situation where each is modelled with a GLM (without random effect). For the sake of simplicity, we focus on the singleâcomponent SCGLR (). Section 3.1 briefly recalls some standards for univariate GLMs. Section 3.2 defines the linear predictors considered in the SCGLR methodology, in a multivariate GLM framework with additional explanatory variables. Finally, Section 3.3 introduces the criterion SCGLR maximises to compute the component.
3.1 Notations and main features of univariate GLMs
We refer the reader to McCullagh and Nelder, (1989) for a thorough overview of GLMs. This section is only intended to recall the classical iterative scheme performing maximum likelihood (ML) estimation. Let denote the matrix of explanatory variables and the âdimensional parameter vector. At iteration , the Fisher Scoring Algorithm (FSA) for ML estimation calculates
[TABLE]
where and respectively denote the classical working variable and the associated weight matrix at iteration . As pointed out by Nelder and Wedderburn, (1972), update (1) may be interpreted as a weighted least squares step in the linearised model defined by
[TABLE]
3.2 Linear predictors for SCGLR with multiple responses
We are now considering a multivariate GLM (Fahrmeir and Tutz,, 1994). In this context, SCGLR searches for a component common to all the âs. This component will be denoted and its âdimensional loadingâvector will be denoted , so that . The linear predictor associated with responseâvector then writes
[TABLE]
where and are the regression parameters associated respectively with component and additional explanatory variables . being common to all the âs, predictors are collinear in their âpart. For identification purposes, we impose , where may so far be any symmetric positive definite matrix. Let us note the âth observation of the âth responseâvector and the predictor set. We assume that the responses are independent conditional on , and that the observations are independent. The logâdensity then writes
[TABLE]
where is the logâdensity of the âth response, conditional on its linear predictor. As a result, being the working variable associated with and its variance matrix, the corresponding linearised model derived from the FSA at iteration is
[TABLE]
Although linearised models (2) and (4) seem very similar, (4) is no longer linear, owing to the product . An alternate version of the FSA must therefore be used:
- (i)
Given current values of all the âs and âs, a new loadingâvector is obtained by solving an SCGLRâspecific program (see Section 3.3 for details).
- (ii)
Given a current value of , each is regressed independently on \big{[}\,\boldsymbol{Xu}\mid\boldsymbol{A}\,\big{]} with respect to weight matrix , yielding new regression parameters and .
3.3 Calculating the component maximising an SCGLRâspecific criterion
For an easier reading of this part, we omit the index. For each , consider model endowed with weight matrix . As suggested in Bry et al., (2013), the best loadingâvector in the weighted leastâsquares sense would be the solution of
[TABLE]
The maximisation program also writes â , where
[TABLE]
Now, is a mere goodnessâofâfit (GoF) measure that does not take into account the closeness of component to interpretable directions in . The GoF measure, , must therefore be combined with a measure of structural relevance (SR).
Assume matrix consists of standardised numeric variables. Consider a weight system â e.g. â reflecting the a priori relative importance of variables. Also consider a weight matrix â e.g. â reflecting the a priori relative importance of observations. We define the most structurally relevant loadingâvector as the solution of
[TABLE]
where
[TABLE]
for the scalar product is commutative. Formula (6) is in fact a particular case of the SR criterion proposed by Bry and Verron, (2015); Bry et al., (2016). It can be viewed as a generalised average version of the usual dual PCA criterion: . For , (6) is called âVariableâPowered Inertiaâ (VPI). It should be stressed that for to be invertible, must be a column full rank matrix. In case of strict collinearities within , as it always happens in highâdimensional settings, we replace with the matrix of its principal components associated with nonâzero eigenvalues. The component is then sought as . We have , where is the matrix of corresponding unit-eigenvectors. Then, with . Bry et al., (2018) show that among all loadingâvectors such that , is that which has the minimum ânorm.
Tuning parameter allows to draw component towards more (greater ) or less (smaller ) local bundles of correlated variables, as depicted on Figure 1 in the particular instance of four coplanar variables. Informally, a bundle is a set of variables correlated âenoughâ to be viewed as proxies to the same latent dimension. The notion of bundle is flexible, and parameter tunes the level of withinâbundle correlation to be considered: the higher the correlation, the more local the bundle. Overall, taking draws the components towards global structural directions (namely the principal components) while taking higher leads to more local ones (ultimately, the variables themselves). The goal is to focus on the most interpretable directions.
Finally, let be a parameter tuning the importance of the SR relative to the GoF. SCGLR attempts a tradeâoff between (5) and (6) by solving
[TABLE]
or equivalently
[TABLE]
More detail can be found in Bry et al., (2018).
4 Extension to mixed models
We now propose to extend SCGLR to mixed models. This extension will be called âmixedâSCGLRâ. A particular focus is placed on grouped data, for which the independence assumption of observations is no longer valid. The withinâgroup dependence of each response is modelled with a random groupâeffect. Consequently, each is modelled with a GLMM. As in SCGLR, the responses are assumed to be independent conditional on the components. Section 4.1 presents the singleâcomponent mixedâSCGLR method. The underlying algorithm is given in Section 4.2. Considering only one component is generally not enough to explain the responses making it necessary to search for explanatory components, with . The way in which we extract higher rank components is explained in Section 4.3.
4.1 First component
The random groupâeffect is assumed different across responses. This leads to randomâeffect vectors , which are assumed independent and normally distributed:
[TABLE]
where denotes the number of groups. In this paper, variance components models will be considered. We assume , where is the group variance component associated with response . Linear predictors involved in mixedâSCGLR are expressed as
[TABLE]
where is the known random effectsâ design matrix. Predictor epitomises the way we capture the dependence between outcomes. Indeed, as component does not depend on , it captures a structural dependence between the various âs. By contrast, the random effect models the withinâgroup stochastic dependence of outcomes forming responseâvector .
Recall that the distribution of the data conditional on the random effects is supposed to belong to the exponential family. The FSA was adapted by Schall, (1991) to the GLMM dependence structure. The key idea is to extend Schallâs algorithm to the componentâbased predictors in (8).
4.1.1 Linearisation step
Let denote the link function for response , its first derivative and the conditional expectation (i.e. ). The working variable associated with is calculated through
[TABLE]
In view of the conditional independence assumption, the conditional variance matrix for is
[TABLE]
where and are known functions, and is the dispersion parameter related to . At iteration , the conditional linearised model for working vector is then defined by
[TABLE]
Besides the variance component estimation, an alternated estimation step has to be developed (as aforementioned in Section 3.2) to deal with the nonâlinearity of (9).
4.1.2 Estimation step
Calculating the component:
Given current values of all the âs, âs, âs and âs, a new component is calculated by solving a (7)âtype program. However, (5) has to be adapted to conditional linearised models âs, involving weight matrices âs. The appropriate goodnessâofâfit measure is
[TABLE]
Estimating the regression parameters and varianceâcomponents:
Given a current value of component , we apply Schallâs method with the linear predictors given in (8). New values of parameters and as well as new prediction are obtained by solving the following Henderson system (Henderson,, 1975):
[TABLE]
Finally, as mentioned by Schall, (1991), given prediction for , the update of the ML estimation of variance component is
[TABLE]
4.2 The algorithm
The conditional linearised models considered at iteration are given by (9). Algorithm 1 describes the âth iteration of the singleâcomponent mixedâSCGLR. It is repeated until stability of parameters is reached.
Step 1:
Computing the component. Set
Step 2:
Henderson systems. For each , solve the system
Call , and the solutions.
Step 3:
Updating varianceâcomponent estimates. For each , compute
Step 4:
Updating working variables and weighting matrices.
For each , compute
Incrementing:
Algorithm 1 Iteration of the singleâcomponent mixedâSCGLR
4.3 Extracting higher rank components
Let {\boldsymbol{F_{h}}}=\big{[}\,\boldsymbol{f_{1}}\mid\ldots\mid\boldsymbol{f_{h}}\,\big{]} be the matrix of the first components, where . An extra component must best complement the existing ones plus , i.e. \boldsymbol{A_{h}}=\big{[}\,\boldsymbol{F_{h}}\mid\boldsymbol{A}\,\big{]}. So must be calculated using as additional explanatory variables. Moreover, we must impose that be orthogonal to , i.e. . Component is thus obtained by solving
[TABLE]
where and .
In the online Supplementary Material, we give a simple tool to maximise, at least locally, any criterion on the unit sphere: the Projected Iterated Normed Gradient (PING) algorithm. In particular, PING solves (11)âtype programs, which give all components of rank . The rankâone component is computed using the same program with and .
5 Comparative results on simulated data
Five simulation studies have been implemented to assess our method. The first one (discussed in Section 5.1 â 5.4) focuses on LMMs. It compares the performances of mixedâSCGLR, LMMâridge (Eliot et al.,, 2011) and GLMMâLASSO (Groll and Tutz,, 2014; Schelldorfer et al.,, 2014). The second simulation (Section 5.5) extends the first one to binary and Poisson outcomes. All simulation studies have been performed using R (R Core Team,, 2017). To compute LASSO regressions, we have used the R package glmmLasso (Groll,, 2017). The extension of SCGLR to mixed models is available at https://github.com/SCnext/mixedSCGLR. Three additional simulations are presented in the online Supplementary Material. The first one reproduces the simulation scheme of Section 5.5 with binomial and Poisson outcomes. The second one assesses the performance of mixedâSCGLR on a different bundle structure and presents results concerning variance component estimates. The third one deals with high dimensional data.
5.1 Data generation
To generate grouped data, we consider groups and observations per group (i.e. a total of observations). The random effectsâ design matrix is then . Explanatory variables consist of three independent bundles: (15 variables), (10 variables) and (5 variables). Each explanatory variable is normally simulated with mean [math] and variance . Parameter tunes the level of redundancy within each bundle: the correlation matrix of bundle is
[TABLE]
where is the number of variables in . In order to enable comparison with LASSO and ridge and to focus on regularisation, our simulations do not involve additional explanatory variables (). Two random responses \boldsymbol{Y}=\big{[}\,\boldsymbol{y_{1}}\mid\boldsymbol{y_{2}}\,\big{]} are generated as
[TABLE]
such that is predicted only by bundle , only by bundle , and bundle plays no explanatory role. Our choice for the fixedâeffect parameters is
[TABLE]
Finally, for each , random effect and noise vectors are simulated respectively from
[TABLE]
where . For each value of , samples are generated according to Model (12).
5.2 Parameter calibration
In order to compare mixedâSCGLR with the ridge and LASSO regressions, we recall the regularisation parameters required by each method. For both LMMâridge and GLMMâLASSO methods, a unique shrinkage parameter has to be calibrated: and respectively. For mixedâSCGLR, three tuning parameters need to be calibrated: the number of components and the tradeâoff parameter , which are both regularisation parameters, and the bundleâlocality parameter . For greater clarity, the simulation focuses on the behaviour of and . As recommended by Bry et al., (2013), we set . In caseâstudies, has to be tuned to maximise the interpretability of components.
For both mixedâSCGLR and GLMMâLASSO, optimal regularisation parameters are obtained through a âfold crossâvalidation, withdrawing observations from each group every time. This could be termed âleaveâtwoâobservationsâout per group.â The data are thus divided into five parts , each containing observations, for each of the groups. Let be the âth observation of the âth response vector in the âth sample. Let also be the fit for with part removed. The crossâvalidation error in the âth sample, , is defined as
[TABLE]
where
[TABLE]
In the âth sample, the optimal number of components , the tradeâoff parameter , and the shrinkage parameter are selected to minimise the crossâvalidation error (13). We then define
[TABLE]
By contrast, Eliot et al., (2011) suggest to calibrate the ridge parameter at each step of their EM implementation, using the generalised crossâvalidation. We thus define
[TABLE]
where denotes the average of the ridge parameter values obtained over all the iterations of the EM algorithm in the âth sample.
Table 1 summarises the optimal regularisation parameters selected through crossâ validation. In both ridge and LASSO, the shrinkage parameter value increases with the level of redundancy . Whereas for mixed-SCGLR, when increases, decreases towards the true number of predictive variableâbundles: the greater the value of , the better mixedâSCGLR focuses on the structures in that contribute to model . Moreover, when increases, the tradeâoff parameter increases, meaning that regularisation requires a greater importance of the structural relevance relative to the goodnessâofâfit.
5.3 Comparison of the estimate accuracies
Once tuning parameters are obtained, we focus on the fixedâeffect estimatesâ accuracy. Since the responseâvectors and are normally distributed and have comparable orders of magnitude, the fixedâeffect relative errors are on the same scale. Then we consider a riskâaverse comparison criterion called âMean Upper Relative Squared Errorâ (MURSE) defined as
[TABLE]
where is the estimate of associated with sample . The MURSE values for mixedâSCGLR, LMMâridge and GLMMâLASSO are presented in Table 2. The LMM results obtained without regularisation are also presented. They were computed using the R package lme4 (Bates et al.,, 2015). In the latter case, relative errors increase dramatically with . Those of ridge and LASSO increase less drastically (but increase anyway) because these methods suffer from the high correlations among the explanatory variables. Except for , mixedâSCGLR provides the most accurate fixed effect estimates. Indeed, if there are no real bundles in (), searching for structures in may lead mixedâSCGLR to be slightly less accurate. Conversely, mixedâSCGLR takes advantage of the high correlations among the explanatory variables: the stronger the structures (high ), the more efficient the method.
5.4 Model interpretation
This section aims at highlighting the power of mixedâSCGLR for model interpretation. Figure 2 presents an example of the first component planes obtained for , with associated optimal parameter values and . We still impose . The first two components obtained are the ones which explain the responses. It clearly appears that is explained by bundle and by . Interestingly, although bundle is the one with maximum inertia (), it appears only along the third component, for having no explanatory part.
5.5 Additional simulations involving nonâGaussian outcomes
This section aims at assessing our method in the case of Bernoulli () and Poisson () distributions of responses. We still consider groups and observations per group. We keep design matrices and defined in Section 5.1, as well as the values of , , and . The group variance components are given by and so that for each , \widetilde{\boldsymbol{\xi_{k}}}\mathrel{\overset{}{\scalebox{1.5}[1.0]{\sim}}}\mathcal{N}_{N}\left(\boldsymbol{0},\;\varsigma_{k}^{2}\,\boldsymbol{I}_{N}\right). Then given and , we simulate \boldsymbol{Y}=\big{[}\,\boldsymbol{y_{1}}\mid\boldsymbol{y_{2}}\,\big{]} as
[TABLE]
where and . Again, for each value of , samples are generated according to Model (14). As in Section 5.2, tuning parameters are calibrated so as to minimise the crossâvalidation error (13). However, since and do not have the same range of values, the prediction errors have to be standardised. The crossâvalidation error for response in the âth sample is now given by
[TABLE]
Unlike in Section 5.3, the responseâvectors do not come from the same distribution and have different orders of magnitude. The fixedâeffect relative errors are thus not comparable. To compare mixedâSCGLR with GLMMâLASSO and classical GLMM (without regularisation), we thus use the Mean Relative Squared Error (MRSE) defined as
[TABLE]
where is the estimate of from the âth sample. MRSE values for the GLMM, mixedâSCGLR and GLMMâLASSO are presented in Table 3. For all methods, estimating a Bernoulli model is obviously a more challenging task than estimating a Poisson model. Regardless of the level of redundancy , both mixedâSCGLR and GLMMâLASSO outperform classical GLMM estimation. Compared with the Gaussian case (Section 5.3), the results deteriorate but (overall) the same behaviours are observed.
For , fixedâeffect estimates provided by mixedâSCGLR are less accurate than those provided by GLMMâLASSO. In this case, GLMMâLASSO has indeed a double advantage. First, many âs are true zeros. Unlike mixedâSCGLR, GLMMâLASSO often shrinks their estimates to exactly zero. Second, since the level of redundancy is low, GLMMâLASSO also provides accurate coefficient estimates of active variables.
By contrast, for , mixedâSCGLR takes advantage of redundancies within the explanatory variables. Thus, mixed-SCGLR outperforms GLMMâLASSO in this case, despite the sparse structure of the âs.
Even if the response variables are not Gaussian, the power of mixedâSCGLR for model interpretation is preserved. Graphical diagnoses similar to those provided in Section 5.4 are available in the Supplementary Material.
6 An application to forest ecology data
6.1 Data description
The present study is based on the Genus dataset of the CoForChange project (see http://www.coforchange.eu). The subsample we consider gives the abundance of common tree genera on Congo Basin land plots. These plots are grouped into forest concessions. To predict abundances, we have environmental variables, plus explanatory variables which code geology and anthropogenic interference. consists of all environmental variables which are:
physical factors linked to topography, rainfall or soil moisture,
photosynthesis activity indicators (the Enhanced Vegetation Indices, EVI, the NearâInfraRed indices, NIR, and the MidâInfraRed indices, MIR),
indicators which describe the tree height.
Physical factors are many and redundant: monthly rainfalls are highly correlated, and so are photosynthesis activity indicators. By contrast, geology and anthropogenic interference are weakly correlated and interesting per se. These variables are then considered as additional explanatory variables and included in matrix .
6.2 Model and parameter calibration
Abundances of species given in Genus are count data. For each , we consider a Poisson regression with link
[TABLE]
where is the âlevel randomâeffect vector used to model the dependence between the observations of within concessions. The first crossâvalidations we performed â with different fixed values of parameters and â indicated that four components were sufficient to capture most of the information in needed to model and predict responses. We therefore keep . The optimal values of tradeâoff and locality parameter and are then determined through another crossâvalidation. Using the same procedure and notations as in Section 5.2, the data are divided into five parts . Let be the size of .
[TABLE]
where
[TABLE]
On Figure 3, we plot the errors for parameter pairs , where
[TABLE]
Parameter grid therefore contains pair values. Selecting the best parameter pair from through a âfold crossâvalidation requires a computation time of about minutes (parallel computing on CPU cores, Intel Core i7â6700HQ, 2.6GHz). It should be noted that there is a risk of nonâconvergence when the tradeâoff parameter is too close to [math]. Indeed, if we consider no structural information ( exactly equal to [math]) in , mixedâSCGLR merely performs classical GLMM estimation and does not converge with this data. When , our algorithm converges but leads to fairly unstable estimates and high crossâvalidation errors because regularisation is then very weak. By contrast, the components calculated with are close to principal components. The associated errors are therefore stable in most cases, but rather high. Finally, leads to the lowest crossâvalidation errors, but only for . Indeed, when is not too high, mixedâSCGLR may focus on the most predictive structures of . However, parameter must not exceed a certain value, in order to avoid being drawn towards too local variableâbundles. As can be seen, choosing minimises the crossâvalidation error.
6.3 Prediction quality and interpretation results
This part evaluates the benefits obtained by taking withinâgroup dependence into account. The predictions we get with mixedâSCGLR and with initial version of SCGLR are compared with respect to the crossâvalidation criterion given by (15). Table 4 summarises the âs for both SCGLR and mixedâSCGLR methods. Optimal parameter value triplet is selected for both methods. For each , mixedâSCGLR gives a lower crossâvalidation error than SCGLR: taking into account the withinâgroup dependence has clearly improved prediction performances.
Moreover, mixedâSCGLR enables to correctly reconstitute observed abundance maps, as illustrated on Figure 4.
As has been seen in Section 5.4, mixedâSCGLR allows an easy interpretation of the model through the decomposition of linear predictors on interpretable components. Figure 5 shows the first two component planes resulting from mixedâSCGLR on real data Genus. Component plane reveals two patterns. The first one is a global rainâwind pattern driven by the pluvioâs and wdâs variables which explain the abundances of Species . The second is a rather local pattern driven by variables altitude, wetness and annual pluviometry (pluvioan) which prove important to model and predict responses and . Lastly, Component reveals a photosynthesis pattern driven by a part of the Eviâs, which seems useful to predict and .
The decomposition of linear predictors on interpretable components allows to detect the species that tend to share common explanatory dimensions and those which are more idiosyncratic. We can then identify the variableâbundles these dimensions are related to. The underlying goal is a better understanding of the bioâ and ecosystem diversity with a view to preserve them. Species 1, 2, 5 and 6 are sensitive to the same rainâwind regime, and Species 4 and 8 are explained by the same photosynthetic pattern. On the contrary, Species 3 and 7 are clearly separated. Species 7 grows at high altitudes where the atmosphere is rather dry while the abundance of Species 3 is favoured by regular rainfall and high humidity.
7 Discussion and Conclusions
Like Sufficient Dimension Reduction (SDR) methods, mixedâSCGLR is based on the construction of a reduction function of dimension less than which tries to capture all the relevant information that contains about . However, the two approaches do not exactly pursue the same objectives. Indeed, SDR methods look for the âcentral subspaceâ containing the predictive information irrespective of the structures within (e.g. dimensions capturing a large part of âs variance, or bundles of correlated variables). MixedâSCGLR rather aims at basing the explanatory subspace on such structural dimensions so as to both gain interpretability and stabilise prediction. We think that extracting a hierarchy of strong and interpretable dimensions, and decomposing the linear predictor on them, is an essential asset in modelâbuilding. The difference in goals entails a difference in means: SDR is based on the sufficiency principle, which is enough to identify a subspace but not to track strong predictive dimensions in it. By contrast, in the wake of PLS regression, mixedâSCGLR uses a criterion combining goodnessâofâfit and structural relevance of components.
The supervisedâcomponent paradigm has proved effective in situations where regularisation is necessary but where variable selection is inappropriate â for instance when the true explanatory dimensions are latent and indirectly measured through highly correlated proxies.
When , tradeâoff parameter allows to continuously tune the attraction of components towards the principal components of explanatory variables. This results in a continuum between classical GLMM estimation ( is associated with no regularisation) and principal component generalised linear mixed regression (with ).
When , we take better advantage of local predictive structures in . The components we build are then usually closer to local gatherings of variables, thus easier to interpret.
MixedâSCGLR is able to identify more or less local predictive structures common to all the âs and performs well on grouped data with Gaussian, Bernoulli, binomial and Poisson outcomes. Compared to penaltyâbased approaches as ridge or LASSO, the orthogonal components built by mixedâSCGLR reveal the multidimensional explanatory and predictive dimensions, and greatly facilitate the interpretation of the model.
However, a natural question arises as to the accuracy of our methodology under significant deviations from normality. With binary data for instance, variance component estimates are prone to some bias towards zero (McCulloch,, 1997). That is why other estimation strategies might be considered, especially Monte Carlo integration methods which have the advantage of being based on direct approximations of the likelihood. Some examples are the MCMC methods developed by Hadfield, (2010) in the GLMM framework, and the Monte Carlo Likelihood Approximation (MCLA) proposed by Knudson, (2016). Indirect maximisations of the likelihood are also available such as Monte Carlo ExpectationâMaximisation (MCEM) and Monte Carlo NewtonâRaphson (McCulloch,, 1997). We think that these methodologies and Schallâs could be combined sequentially. Indeed we could first take advantage of the linear approximation of the model in order to build the components, and then use MCâbased methods to estimate both fixedâeffect parameters and variance components. This would lead to replacing the current iteration of mixedâSCGLR given by Algorithm 1 with the following steps (to keep things simple, we take the canonical link):
Compute components \boldsymbol{F}=\big{[}\,\boldsymbol{f_{1}}\mid\ldots\mid\boldsymbol{f_{K}}\,\big{]} via the PING algorithm on Schallâs linearised models.
- 2.
For each , consider the hierarchy
[TABLE]
where , and , are the functions associated with the natural parametrisation of the GLM. For example, for the Bernoulliâlogistic regression, we have: and .
- 3.
Apply MCâbased methods such as MCMC, MCLA, MCEM or MCNR to update , , and , .
- 4.
Update working variables and weight matrices to define the new Schallâs linearised models.
Even though such MCâbased methods are computationally much more intensive than the âJointâMaximisationâ and have intrinsic disadvantages (particularly in the assessment of convergence and in the choice of prior distributions), they could give better results in case of binary data.
SUPPLEMENTARY MATERIAL
Additional simulations:
The first simulation reproduces that of Section 5.5 in the case of binomial and Poisson outcomes. The second simulation explores a different structure of variableâbundles, considers Gaussian, binomial and Poisson outcomes, and presents results concerning variance component estimates. The third one involves high dimensional data. (pdf file)
Projected Iterated Normed Gradient (PING) algorithm:
We give some technical details about the PING algorithm, which maximises, at least locally, any criterion on the unit sphere. (pdf file)
R package mixedSCGLR:
We provide an R package to perform mixedâSCGLR, also available at https://github.com/SCnext/mixedSCGLR. It contains the dataset Genus used in Section 6. The package also provides demo codes, in particular for visualising the component planes (mixedSCGLR.tar.gz).
Code for running simulations:
We also provide the R codes required to reproduce most of the simulation results (R and Rdata files).
ACKNOWLEDGMENTS
The extended data Genus required the arrangement and the inventory of 140.000 developed plots across four countries : Central African Republic, Gabon, Cameroon and Democratic Republic of Congo. The authors thank the members of the CoForTips project for allowing the use of this data. We are also grateful to the editor, the associate editor and to the referees for their thorough and constructive review of this work.
**Supplementary file to âComponentâbased regularisation of multivariate generalised linear mixed modelsâ:
THE PING ALGORITHM**
Jocelyn Chauvet1 âCatherine Trottier1,2 âXavier Bry1
1 IMAG, Univ Montpellier, CNRS, Montpellier, France.
[email protected] ; [email protected]
2 Univ Paul-ValĂŠry Montpellier 3, Montpellier, France.
The Projected Iterated Normed Gradient (PING) is a basic extension of the iterated power algorithm, for solving any program of the form
[TABLE]
Note that putting , and , Program (16) is strictly equivalent to Program (17):
[TABLE]
Denoting
[TABLE]
a Lagrange multiplier-based reasoning gives the basic iteration of the PING algorithm:
[TABLE]
Although Iteration (18) follows a direction of ascent, it does not guarantee that actually increases on every step. We therefore propose a generic iteration of PING (Algorithm 2) and an alternative one (Algorithm 3), which both ensure that the criterion increases.
while convergence of non reached do
     Â
     Â
A unidimensional NewtonâRaphson maximisation procedure is used to find
     Â
the maximum of on the arc and take it as .
     Â
end while
Algorithm 2 Generic iteration of the PING algorithm
while convergence of non reached do
     Â
      while  do
           Â
      end while
     Â
     Â
end while
Algorithm 3 Alternative generic iteration of the PING algorithm
First rank component.
Component is obtained by solving
[TABLE]
This corresponds to Program (16) with , where
,
(the matrix of additional explanatory variables), and
.
In this particular case, we have , and so:
[TABLE]
Higher rank components.
Let {\boldsymbol{F_{h}}}=\big{[}\,\boldsymbol{f_{1}}\mid\ldots\mid\boldsymbol{f_{h}}\,\big{]} be the matrix of the first components and \boldsymbol{A_{h}}=\big{[}\,\boldsymbol{F_{h}}\mid\boldsymbol{A}\,\big{]}. Let denote the weight matrix reflecting the a priori relative importance of observations ( if all observations are of equal importance). Component is obtained by solving
[TABLE]
This corresponds to Program (16), where
,
\boldsymbol{A_{h}}=\big{[}\,\boldsymbol{F_{h}}\mid\boldsymbol{A}\,\big{]}, and
.
Initialisation.
To quickly find , algorithm PING is initialised with the first PLS component of the responses on . In like manner, for , PING is initialised with the first PLS component of the responses on deflated on components \boldsymbol{F_{h-1}}=\big{[}\,\boldsymbol{f_{1}}\mid\ldots\mid\boldsymbol{f_{h-1}}\,\big{]}.
**Supplementary file to âComponentâbased regularisation of multivariate generalised linear mixed modelsâ:
ADDITIONAL SIMULATIONS**
Jocelyn Chauvet1 âCatherine Trottier1,2 âXavier Bry1
1 IMAG, Univ Montpellier, CNRS, Montpellier, France.
[email protected] ; [email protected]
2 Univ Paul-ValĂŠry Montpellier 3, Montpellier, France.
8 Comparative results with binomial and Poisson outcomes
In this section, we simply extend the simulation scheme presented in Section 5.5 to binomial and Poisson outcomes. We maintain design matrices and as defined in Section 5.1. Fixedâeffect parameters âs and randomâeffect vectors âs are defined in Section 5.5. Given and , we then simulate \boldsymbol{Y}=\big{[}\,\boldsymbol{y_{1}}\mid\boldsymbol{y_{2}}\,\big{]} as
[TABLE]
Table 5 gives the Mean Relative Squared Error (MRSE) values for and obtained on 100 samples for each value of .
For the Poisson distribution, the results in Table 5 are essentially identical to those in the article: mixedâSCGLR outperforms GLMMâLASSO (Groll,, 2017, R package glmmLasso) except for . As for the binomial distribution, the regularisation provided by mixedâSCGLR improves the results obtained without regularisation (Bates et al.,, 2015, R package lme4), regardless of the level of redundancy within the explanatory variables. Unsurprisingly, the errors are much smaller than in the binary case.
The power of mixedâSCGLR in terms of model interpretation remains the same for nonâGaussian outcomes. Figure 6 (respectively Figure 7) presents an example of the first component planes output by mixedâSCGLR in the binomial/Poisson (respectively Bernoulli/Poisson) case. As for Gaussian outcomes, the component planes reveal that is explained by bundle and by . In the binomial/Poisson case with (Figure 6), predictive bundles and are captured respectively by the first and the second components. The third component aligns on nuisance bundle , despite its high inertia. Figure 7 illustrates what may happen when the level of redundancy is very high ( here). Since the explanatory variables are highly correlated, mixedâSCGLR regularisation requires that the structural relevance be given a heavy weight with respect to the goodnessâofâfit, which leads to a tradeâoff parameter close to 1 ( here). Having the greatest structural strength, the nuisance bundle is captured by the second component despite its lack of explanatory power. This is sometimes the price to be paid for the tradeâoff. In our example, the second explanatory bundle is captured by the third component, so that the predictive dimensions are accurately represented in component plane .
9 A new structure for the bundles â Gaussian, Poisson and binomial distributions
This simulation study tests mixed-SCGLR on a slightly more complex bundle structure. Results concerning variance component estimates are also presented.
We consider a fixedâeffect design matrix partitioned into 3 blocks , and . Block contains 10 predictive explanatory variables structured about a latent variable \boldsymbol{\varphi_{1}}\mathrel{\overset{}{\scalebox{1.5}[1.0]{\sim}}}\mathcal{N}_{n}\left(\boldsymbol{0},\sigma_{\text{LV}}^{2}\boldsymbol{I}_{n}\right). Thus for each , , where \boldsymbol{\varepsilon_{j}}\mathrel{\overset{}{\scalebox{1.5}[1.0]{\sim}}}\mathcal{N}_{n}\left(\boldsymbol{0},\sigma_{\text{noise}}^{2}\boldsymbol{I}_{n}\right) such that . The correlation within is tuned by signal to noise (StN) ratio (chosen in in practice). contains a single predictive variable \boldsymbol{\varphi_{2}}=\boldsymbol{x_{11}}\mathrel{\overset{}{\scalebox{1.5}[1.0]{\sim}}}\mathcal{N}_{n}\left(\boldsymbol{0},\boldsymbol{I}_{n}\right). contains 20 unstructured noise variables: for each , \boldsymbol{x_{j}}\mathrel{\overset{}{\scalebox{1.5}[1.0]{\sim}}}\mathcal{N}_{n}\left(\boldsymbol{0},\boldsymbol{I}_{n}\right). For each , randomâeffect vectors are simulated as \boldsymbol{\xi_{k}}\mathrel{\overset{\text{ind.}}{\scalebox{1.5}[1.0]{\sim}}}\mathcal{N}_{N}\left(\boldsymbol{0},\sigma_{k}^{2}\,\boldsymbol{I}_{N}\right). Given , , , we simulate 3 responses having different distributions, \boldsymbol{Y}=\big{[}\,\boldsymbol{y_{1}}\mid\boldsymbol{y_{2}}\mid\boldsymbol{y_{3}}\,\big{]}, as
[TABLE]
In our simulations, we set , and .
We consider in turn and groups, and observations per group ( and observations in total). samples are generated for each pair of values . The main goal of the study is to assess the ability of mixedâSCGLR to track down both latent variable and predictive variable . For and , we then define
[TABLE]
where is the component most correlated with issued from mixedâSCGLR in the âth sample. Consistency of fixedâeffect estimates is assessed through criteria , and defined by
[TABLE]
where is the fixedâeffect estimate related to response associated with sample .
Table 6 summarises the values of the aforeâdefined criteria and presents biases and standard errors of variance components estimates. For a given value of , increases towards with ratio : the tighter the block is structured about its latent variable, the better mixedâSCGLR can reconstruct it. The associated criterion then naturally decreases towards [math]. On the other hand, and are very stable, which proves that mixedâSCGLR is able to detect an isolated predictive variable among a large number of irrelevant others. As depends on how accurately mixedâSCGLR recovers and , it slightly decreases when the StN ratio increases. Both variance component biases and standard errors seem rather stable regardless of the value of StN. Finally, when increases, all the âs increase towards 1 and all the âs decrease towards 0. As far as variance component estimates are concerned, the biases are getting slightly closer to 0 and the standard errors decrease significantly.
Model interpretation is revealed by Figure 8 in the case of groups and observations per group. The first component aligns with block which alone explains response . The second aligns with (containing single explanatory variable ) which alone explains . Finally, note that the projection of the âpart of the linear predictor related to is well represented on component plane . This indicates that is explained jointly by and .
10 High dimensional data
10.1 Key idea
To cope with high dimensional data, the key idea is to replace the fixedâeffect design matrix, , with the matrix of its principal components associated with nonâzero eigenvalues. More precisely, being the eigenvalue associated with the âth eigenvector , the last eigenvector we consider, , is such that
[TABLE]
where is the number of columns of matrix . The matrix of the corresponding unit-eigenvectors is denoted \boldsymbol{V}=\big{[}\,\boldsymbol{v_{1}}\mid\ldots\mid\boldsymbol{v_{r}}\,\big{]}, and . The component is then sought as a combination of the principal components: , where . MixedâSCGLR then solves
[TABLE]
where the goodnessâofâfit measure, , is given by
[TABLE]
and the structural relevance by
[TABLE]
This idea is tested on simulated data where the number of explanatory variables exceeds the number of observations .
10.2 Data generation
To generate grouped data, we consider groups and observations per group (i.e. a total of observations). The random effectsâ design matrix is then . Explanatory variables consist of four independent bundles , such as \boldsymbol{X}=\big{[}\,\boldsymbol{X_{0}}\mid\boldsymbol{X_{1}}\mid\boldsymbol{X_{2}}\mid\boldsymbol{X_{3}}\,\big{]}. Each explanatory variable is normally simulated with mean [math] and variance . Parameter tunes the level of redundancy within each bundle: the correlation matrix of bundle is
[TABLE]
where is the number of variables in . For each , randomâeffect vectors are simulated as \boldsymbol{\xi_{k}}\mathrel{\overset{\text{ind.}}{\scalebox{1.5}[1.0]{\sim}}}\mathcal{N}_{N}\left(\boldsymbol{0},\sigma_{k}^{2}\,\boldsymbol{I}_{N}\right). Given , , , , we simulate 4 responses having different distributions, \boldsymbol{Y}=\big{[}\,\boldsymbol{y_{1}}\mid\boldsymbol{y_{2}}\mid\boldsymbol{y_{3}}\mid\boldsymbol{y_{4}}\,\big{]}, as
[TABLE]
Response is predicted only by bundle , only by bundle , only by bundle , by both bundles and , and bundle plays no explanatory role. Our choice for the fixedâeffect parameters is
[TABLE]
We consider in turn (, , , ) and (, , , ) explanatory variables. Variance components are set to , and . For each value of and for each value of , samples are generated according to Model (19).
10.3 Results
Table 7 and Table 8 present the results for respectively and explanatory variables. They give the Mean Relative Squared Error (MRSE) values for , as well as biases and standard errors of estimated variance components, obtained on 20 samples for each value of .
Some component planes are given on Figure 9 ( explanatory variables) and Figure 10 ( explanatory variables).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Anderson and Aitkin, (1985) Anderson, D. A. and Aitkin, M. (1985). Variance Component Models with Binary Response: Interviewer Variability. Journal of the Royal Statistical Society, Series B (Methodological) , 47(2):203â210.
- 2Bates et al., (2015) Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixed-effects models using lme 4. Journal of Statistical Software , 67(1):1â48.
- 3Breslow and Clayton, (1993) Breslow, N. E. and Clayton, D. G. (1993). Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association , 88(421):9â25.
- 4Bry et al., (2018) Bry, X., Trottier, C., Mortier, F., and Cornu, G. (2018). Component-based regularisation of a multivariate GLM with a thematic partitioning of the explanatory variables. Statistical Modelling . In press.
- 5Bry et al., (2014) Bry, X., Trottier, C., Mortier, F., Cornu, G., and Verron, T. (2014). Extending SCGLR to multiple regressor-groups: The Theme-SCGLR method. In Proceedings of the eighth International Conference on Partial Least Squares and Related Methods , Paris, France.
- 6Bry et al., (2016) Bry, X., Trottier, C., Mortier, F., Cornu, G., and Verron, T. (2016). The Multiple Facets of Partial Least Squares and Related Methods , chapter Supervised Component Generalized Linear Regression with Multiple Explanatory Blocks: THEME-SCGLR, pages 141â154. Springer International Publishing.
- 7Bry et al., (2013) Bry, X., Trottier, C., Verron, T., and Mortier, F. (2013). Supervised component generalized linear regression using a PLS-extension of the Fisher scoring algorithm. Journal of Multivariate Analysis , 119(4):47â60.
- 8Bry and Verron, (2015) Bry, X. and Verron, T. (2015). THEME: TH Ematic Model Exploration through multiple co-structure maximization. Journal of Chemometrics , 29(12):637â647.
