Primal path algorithm for compositional data analysis
Jong-June Jeon, Yongdai Kim, Sungho Won, Hosik Choi

TL;DR
This paper introduces an efficient solution path algorithm for $l_1$ regularized regression and classification models tailored for compositional data, addressing computational challenges in high-dimensional settings.
Contribution
It develops a novel, faster algorithm for compositional data analysis that handles linear constraints more efficiently than existing methods.
Findings
Algorithm significantly reduces computation time.
Effective in high-dimensional microbiome data analysis.
Extended to classification tasks with compositional predictors.
Abstract
Compositional data have two unique characteristics compared to typical multivariate data: the observed values are nonnegative and their summand is exactly one. To reflect these characteristics, a specific regularized regression model with linear constraints is commonly used. However, linear constraints incur additional computational time, which becomes severe in high-dimensional cases. As such, we propose an efficient solution path algorithm for a regularized regression with compositional data. The algorithm is then extended to a classification model with compositional predictors. We also compare its computational speed with that of previously developed algorithms and apply the proposed algorithm to analyze human gut microbiome data.
| 200 | 500 | 1000 | 200 | 500 | 1000 | |
|---|---|---|---|---|---|---|
| comlasso | 0.23 (0.01) | 0.22 (0.00) | 0.22 (0.00) | 0.52 (0.01) | 0.54 (0.01) | 0.55 (0.01) |
| genlasso | 0.27 (0.01) | 3.25 (0.06) | 19.23 (0.33) | 0.43 (0.01) | 4.18 (0.07) | 30.44 (0.34) |
| 20 | 50 | 100 | 20 | 50 | 100 | |
|---|---|---|---|---|---|---|
| comlasso | 0.26 (0.01) | 0.23 (0.00) | 0.24 (0.01) | 0.67 (0.02) | 0.83 (0.02) | 0.90 (0.02) |
| genlasso | 21.93 (0.48) | 20.57 (0.36) | 18.84 (0.36) | 35.16 (0.53) | 36.75 (0.67) | 32.49 (0.55) |
| comlassoA | comlassoB | Wilcoxon test | |
|---|---|---|---|
| No. | Genera (sel. Prob.) | Genera (sel. Prob.) | Genera (p-value) |
| 1 | F. Lactobacillus (0.744) | F. Lactobacillus (0.645) | F. Faecalibacterium (0.001) |
| 2 | A. Bifidobacterium (0.468) | F. Phascolarctobacterium (0.357) | P. Sutterella (0.003) |
| 3 | P. Sutterella (0.418) | F. Faecalibacterium (0.334) | F. Oscillospira (0.004) |
| 4 | B. Parabacteroides (0.402) | F. Christensenella (0.324) | F. Dehalobacterium (0.006) |
| 5 | P. Desulfovibrio (0.360) | F. Roseburia (0.314) | P. Oxalobacter (0.006) |
| 6 | F. Enterococcus (0.341) | A. Eggerthella (0.297) | P. Desulfovibrio (0.008) |
| 7 | F. Turicibacter (0.328) | A. Bifidobacterium (0.274) | B. Butyricimonas (0.022) |
| 8 | A. Eggerthella (0.290) | B. Prevotella (0.260) | F. Christensenella (0.022) |
| 9 | A. Collinsella (0.270) | F. Clostridium (0.250) | B. Parabacteroides (0.053) |
| 10 | F. Roseburia (0.264) | F. Enterococcus (0.240) | F. Lactobacillus (0.053) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping · Hydrocarbon exploration and reservoir analysis · Mineral Processing and Grinding
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
Primal path algorithm for compositional data analysis
Jong-June Jeon
Department of Statistics & Natural Science Research Institute University of Seoul,
Yongdai Kim
Department of Statistics, Seoul National University,
Sungho Won
Department of Public Health Science, Seoul National University,
and Hosik Choi
Department of Applied Statistics, Kyonggi University
Abstract
Compositional data have two unique characteristics compared to typical multivariate data: the observed values are nonnegative and their summand is exactly one. To reflect these characteristics, a specific regularized regression model with linear constraints is commonly used. However, linear constraints incur additional computational time, which becomes severe in high-dimensional cases. As such, we propose an efficient solution path algorithm for a regularized regression with compositional data. The algorithm is then extended to a classification model with compositional predictors. We also compare its computational speed with that of previously developed algorithms and apply the proposed algorithm to analyze human gut microbiome data.
Keywords: penalized regression; constraint; solution path algorithm; microbiome
1 Introduction
In modern regression analysis, it is frequently observed that regression predictors consist of the proportions or relative ratios of certain values rather than absolute values. For example, in analyzing air pollution data, the percentages of chemicals in the air are considered relevant predictors to identify the source of a pollutant (Lee et al.,, 2007). These types of proportional data, typically called compositional data, are widely used in geoscience (Buccianti et al.,, 2006), microbiology (Montassier et al.,, 2016), and nutritional biochemistry (Leite,, 2016). By the definition of compositional data, all compositional predictors lie on the simplex and are thus linearly dependent.
Aitchison and Bacon-shone, (1984) proposed a regression model for compositional data as follows. Let be a real valued response vector and the predictors, where is an element on the -dimensional standard simplex. Instead of modeling on directly, Aitchison and Bacon-shone, (1984) introduced a specific transformation of , in which by the logarithm of each component of the vector, and proposed the following regression model with a constraint:
[TABLE]
where , \mbox{\boldmath{\beta}}=(\beta_{1},\cdots,\beta_{p})^{\top}, , and is a -dimensional random vector with mean zero and finite variances. Subsequently, Lin et al., (2014) adopted (1) as a regression model with compositional data and proposed the regularized estimator of the regression coefficients, given by
[TABLE]
However, the standard optimization algorithm for the regularized estimator is not directly applicable to solving (1) due to constraint {\bf{1}^{\top}}\mbox{\boldmath{\beta}}=0. For example, the coordinate-wise algorithm (Friedman et al.,, 2010), one of the most popular algorithms for regularization, cannot be used because the penalty function is non-separable (Tseng and Yun,, 2009). Alternatively, quadratic programming (Brodie et al.,, 2009; Bondell and Reich,, 2009) or the alternating direction method of multipliers (ADMM) algorithm can be used (Lin et al.,, 2014; Fang et al.,, 2015). However, all these algorithms solve problem (1) for a fixed and do not provide a solution path.
Various solution path algorithms for the regularized estimator with linear constraints have been developed as to provide a solution path by Tibshirani and Taylor, (2011), Zhou and Lange, (2013), and Zhou and Wu, (2014). However, these algorithms have their own problems when applied to high-dimensional data. For example, the generalized lasso (genlasso) algorithm of Tibshirani and Taylor, (2011) does not generally provide a solution path when and uses the additional penalty to solve this problem. However, adding the penalty not only increases computational complexity but also makes the corresponding solution path inaccurate (see Section 3 for details on this problem). The algorithms of Zhou and Lange, (2013), Zhou and Wu, (2014), and Gaines et al., (2018) solve general problems, including (1). However, a twice-differentiable loss function is required to solve the problem, which precludes further extensions such as a Huberized regression. Moreover, all aforementioned algorithms are not optimal for solving (1), which has specific constraints.
The aim of this paper is thus to propose an algorithm to construct a solution path \{\hat{}\mbox{\boldmath{\beta}}(\lambda):\lambda\geq 0\}, in which
[TABLE]
Where , \mbox{\boldmath{\beta}}=(\mbox{\boldmath{\beta}}_{1}^{\top},\cdots,\mbox{\boldmath{\beta}}_{K}^{\top})^{\top}\in\mathbb{R}^{p} with \mbox{\boldmath{\beta}}_{k}\in\mathbb{R}^{p_{k}}, and . We let L({\bf y},X\mbox{\boldmath{\beta}})=\sum_{i=1}^{n}l(y_{i},{\bf x}_{i}^{\top}\mbox{\boldmath{\beta}}) and consider as the class of piecewise quadratic loss functions (Rosset and Zhu,, 2007) including quadratic, Huberized, and quadratic hinge loss. In the regression model, , and in the classification model, . Additionally, we introduce constant vectors for to consider the more general constraint in the problem. For example, multiple compositional predictors can be included in the model. We call the proposed algorithm the composite-lasso (comlasso).
The main contribution of this paper is clearly describing the event of violation for the Karush-Kuhn-Tucker (KKT) conditions corresponding to the linear constraint in (1). We monitor the event and employ the monitoring result to determine the solution path. This enables us to seek the exact initial solution referred to in Gaines et al., (2018) when and also provides an efficient way to find the exact activation set in the solution path. To the best of our knowledge, this is the first paper to provide an exact description of the violation. Therefore, comlasso has the following advantages compared to other algorithms: it is computationally more efficient, especially for high-dimensional data, requires no modification by adding the penalty, and works with various loss functions other than square loss. In this paper, the intercept of the model is not considered for convenience. However, the proposed algorithm can be easily modified to include the intercept.
The remainder of the paper is organized as follows. Section 2 explains the KKT conditions for problem (1) and describes the solution path generation rule. Section 3 provides the results of the simulation study and the microbiome data analysis. Section 4 presents concluding remarks.
Finally, we define the notations used in this paper. For a positive integer and denote the -dimensional vector whose elements are and [math], respectively. For an index set , denotes the subvector of , whose elements are chosen by index set . Similarly denotes the submatrix of , whose columns are drawn according to index set . When is singleton, we denote the index set by its element (e.g., and for ). Let for be the consecutive partition of , with , where is the cardinality of set . Let be the membership of , that is, . We denote L({{\bf y}},X\mbox{\boldmath{\beta}}) simply by L(\mbox{\boldmath{\beta}}). Finally, is the elementwise inequality in the vector.
2 Primal path algorithm for comlasso
The almost quadratic loss function is defined by l(y,{\bf x}^{\top}\mbox{\boldmath{\beta}})=a(r)r^{2}+b(r)r+c(r), where , and are piecewise constants depending on , which is a function of and {\bf x}^{\top}\mbox{\boldmath{\beta}}. In the regression model, is the residual, y-{\bf x}^{\top}\mbox{\boldmath{\beta}}, and in the classification model, is the margin, y{\bf x}^{\top}\mbox{\boldmath{\beta}}. Rosset and Zhu, (2007) proposed an algorithm for obtaining the piecewise linear solution path of the regularized optimization problem with the use of the above almost quadratic loss function. Essentially, piecewise linearity is derived from the condition that the second-order derivative of the loss function is locally constant. See examples 1 and 2 in the Appendix, where lists of popular almost quadratic functions are illustrated. Additionally, comlasso employs the property of the loss function and finds a solution set of (1) satisfying piecewise linearity. Therefore, comlasso is mainly based on conventional solution path algorithms. However, it monitors the residuals or margins between and X\mbox{\boldmath{\beta}} on \mbox{\boldmath{\beta}}\in\mbox{null}(D^{\top}) and exactly computes the dual parameters corresponding to the equality constraint D^{\top}\mbox{\boldmath{\beta}}={\bf 0}_{K}. This is the key step to implement the comlasso algorithm in a high-dimensional model.
The comlasso algorithm consists of four steps: determining the first active coefficients with maximum , identifying the direction of solution path for the active coefficients, computing the step size of the solution path by monitoring the violation of KKT conditions, and updating the active coefficients and dual parameters. The last three steps are repeated until the algorithm cannot update the set of active coefficients anymore.
2.1 KKT conditions and initialization of comlasso
The Lagrangian of primal problem (1) is given by
[TABLE]
where \mbox{\boldmath{\beta}}=(\beta_{1},\cdots,\beta_{p})^{\top}, \mbox{\boldmath{\mu}}=(\mu_{1},\cdots,\mu_{K})^{\top}\in\mathbb{R}^{K},
[TABLE]
and is the dual parameter corresponding to the equality constraint. The KKT conditions are as follows: there exists a vector (\mbox{\boldmath{\beta}},\mbox{\boldmath{\mu}}), so that
[TABLE]
where , \nabla L(\mbox{\boldmath{\beta}}) is the gradient vector of L(\mbox{\boldmath{\beta}}) and \nabla_{{\cal A}}L(\mbox{\boldmath{\beta}}) denotes (\nabla L(\mbox{\boldmath{\beta}}))_{{\cal A}}. Eqs. (6) and (8) are the stationarity and primal feasibility conditions, respectively. Eq. (7) is a stationarity condition with the existence of dual parameter , which plays a key role in comlasso. Note that, if it exists, the dual parameter is not necessarily unique.
Specifically, to describe the existence of , we introduce the dual feasible set of for a given and :
[TABLE]
Particularly, let \mbox{\boldmath{\beta}}={\bf 0}_{p}; then, is the feasible set of corresponding to the initial solution of comlasso. If we let , \mbox{\boldmath{\beta}}={\bf 0}_{p} is a trivial optimal solution of (1) for all . However, for \mbox{\boldmath{\beta}}={\bf 0}_{p} is not the solution of (1) anymore, because the corresponding does not exist. Therefore, comlasso sets as the initializing regularization parameter.
Obviously, is not empty if and only if
[TABLE]
for all , where . Note that and is as per the notation described above. Therefore, we have
[TABLE]
If there exists the unique triplet satisfying Eq. (9), the satisfying (8) is uniquely given by or . Furthermore, the sign of and is assigned by and , which will be explained in Section 3.3. The dual feasible set \mathcal{E}_{\lambda}(\mbox{\boldmath{\beta}}) will be used to re-check the violation of the KKT conditions.
Remark. Gaines et al., (2018) proposed a method to find an initialized solution under the constrained lasso, which includes problem (1) as a special case. However, it is not clearly shown whether one or more coefficients can be simultaneously activated on the null space of . In problem (1), two or more coefficients should be simultaneously activated in the initialization of the solution path algorithm. Additionally, all the signs of activated coefficients cannot be equal. When there are the multiple triplets in (9), comlasso chooses only one of them and activates the corresponding coefficients.
2.2 Finding the direction of the solution path
Rosset and Zhu, (2007) proposed a toolbox for the linear solution path. We explain the piecewise linearity of the solution path based on this toolbox, considering the regression problem for convenience. To emphasize the dependence on , we denote the solution of (4) by \mbox{\boldmath{\beta}}(\lambda)=(\beta_{1}(\lambda),\cdots,\beta_{p}(\lambda))^{\top}, and let the active variable and group variable index set be and \mathcal{K}=\{k:\|\mbox{\boldmath{\beta}}_{{\cal G}_{k}}(\lambda)\|_{1}\neq 0\}, respectively. By the KKT conditions in (6) and (8), we have the following two equations:
[TABLE]
where r_{i}(\lambda)=(y_{i}-{\bf x}_{i{\cal A}}^{\top}\mbox{\boldmath{\beta}}_{{\cal A}}(\lambda)). We set a candidate of solutions \mbox{\boldmath{\beta}}(\lambda-\delta) with as
[TABLE]
where and are the solutions of linear equation
[TABLE]
, and is the zero valued matrix. uniquely exists if is positive and definite and is full rank. It is easily verified that \mbox{\boldmath{\beta}}(\lambda-\delta) and \mbox{\boldmath{\mu}}_{{\cal K}}(\lambda-\delta) always satisfy
[TABLE]
and D^{\top}\mbox{\boldmath{\beta}}(\lambda-\delta)={\bf 0}_{p}, if \mbox{sign}(\mbox{\boldmath{\beta}}(\lambda))=\mbox{sign}(\mbox{\boldmath{\beta}}(\lambda-\delta)), , and for all . These two equations imply that \mbox{\boldmath{\beta}}(\lambda-\delta) and \mbox{\boldmath{\mu}}(\lambda-\delta) are respectively the primal and dual solutions of
[TABLE]
as far as there exists \mbox{\boldmath{\mu}}_{{\cal K}^{c}}(\lambda-\delta) so that
[TABLE]
Particularly, the existence of \mbox{\boldmath{\mu}}_{{\cal K}^{c}}(\lambda-\delta) is verified by the dual feasible set \mathcal{E}_{\lambda-\delta}(\mbox{\boldmath{\beta}}(\lambda-\delta)) defined in Section 2.1. Therefore, for \mbox{\boldmath{\beta}}(\lambda-\delta) and \mbox{\boldmath{\mu}}(\lambda-\delta) with , we can categorize the violations of the KKT conditions and updating rules as follows:
Rules for updating the activated primal and dual solutions
- (C1)
termination condition: if , then comlasso stops;
- (C2)
sign condition: if \mbox{\boldmath{\beta}}(\lambda-\delta)\neq\mbox{\boldmath{\beta}}(\lambda), then update ;
- (C3)
condition of piecewise constant in loss function: if
[TABLE]
then update ;
- (C4)
dual feasibility condition: if \mathcal{E}_{\lambda-\delta}(\mbox{\boldmath{\beta}}(\lambda-\delta))=\phi, then update and ;
- (C5)
stationarity condition: if there exists \mbox{\boldmath{\mu}}(\lambda-\delta), but
[TABLE]
for some , then update .
In applying the updating rules, the signs of the active coefficients are determined by the inequality of each condition. The determination of signs will be explained in the following subsection. If , , and \mbox{sign}(\mbox{\boldmath{\beta}}_{{\cal A}}) are obtained, then comlasso computes the direction of the solution path by solving (16) again and produces the solution path by increasing before one of the violations occurs. There is also the need to find the minimum that leads one of the violation events. We call the minimum the step size of the solution path.
Remark. In the classification problem, (6) in the KKT conditions is expressed as
[TABLE]
Additionally, is same as that in the regression case. Therefore, equation (16) is also applied to find in the classification problem.
2.3 Computation of the step size
Let \mbox{\boldmath{\beta}}(\lambda) and \mbox{\boldmath{\mu}}(\lambda) be the primal and dual solutions of (1), respectively, and the solution of (16). It is easy to verify whether the first two conditions, (C1) and (C2), are still valid for . The violation of condition (C3) is explained by Rosset and Zhu, (2007) in detail, and we call this violation event “hitting the knots.” As such, we explain the dual feasibility (C4) and stationarity conditions (C5) to determine the step size and sign of the new activated coefficients.
First, we assume that step size is determined by the violation in the dual feasibility condition (C4). The dual feasibility condition is written as
[TABLE]
for , and
[TABLE]
for all . Here, \nabla_{j}L(\mbox{\boldmath{\beta}}(\lambda-\delta))=\nabla_{j}L(\mbox{\boldmath{\beta}}(\lambda))+\sum_{i=1}^{n}\left(2a(r_{i}(\lambda))({\bf x}_{i})_{j}{\bf x}_{i{\cal A}}^{\top}{\bf m}(\lambda)\right)\delta,, being a linear function of . Therefore, the computation of step size is reduced to the problem of finding the maximum with {\cal E}_{\lambda-\delta}(\mbox{\boldmath{\beta}}(\lambda-\delta))\neq\phi, which can be solved by the following linear programming problem:
[TABLE]
for all . It is clear that the feasible set of problem (18) is not empty. In fact, solving (18) requires at most intersections of intervals.
Let be the solution of problem (18); then, there exist and so that
[TABLE]
which is equal to . The violation of the dual feasibility condition means that \mbox{\boldmath{\beta}}(\lambda-\delta)+\delta{\bf m}(\lambda-\delta) for is not the solution of (17) and
[TABLE]
for and in (19). The two equations above imply that and should be included in and to find a new direction of the solution path, respectively, and that the signs of the new activated coefficients are given by and .
Next, we assume that step size is determined by the violation in the stationarity condition (C5) and let step size be again. In this case, \mbox{\boldmath{\mu}}(\lambda-\delta^{*}) exists but, for some and ,
[TABLE]
or
[TABLE]
To find a new direction, we should include index in and let for the former case and for the latter.
2.4 Uniqueness of the solution path
As previously mentioned, the comlasso algorithm consists of the initialization, determination of the signed active set, and computation of step size. We summarize comlasso as follows:
comlasso algorithm
- •
Initialization
- –
Compute from (9) and let and \mbox{\boldmath{\beta}}(\lambda)={\bf 0}_{p};
- –
Initialize and and compute the corresponding dual parameter for ;
- –
Assign the sign of the active coefficients for .
- •
Repeat
- –
Compute and by solving (16);
- –
Compute based on the updating rules in Section 2.2:
- (a)
; 2. (b)
; 3. (c)
is the smallest of “hitting the knot;” 4. (d)
is computed by (18); 5. (e)
\delta_{5}=\{\delta\geq 0:|\nabla_{j}L(\mbox{\boldmath{\beta}}(\lambda-\delta))+(D\mbox{\boldmath{\mu}}(\lambda-\delta))_{j}|\leq(\lambda-\delta)\mbox{ for all }j\in{\cal A}^{c},k(j)\in{\cal K}\}.
- –
Let \mbox{\boldmath{\beta}}_{{\cal A}}(\lambda-\delta^{*})=\mbox{\boldmath{\beta}}_{{\cal A}}(\lambda)+\delta^{*}{\bf b}(\lambda) and \mbox{\boldmath{\mu}}_{{\cal K}}(\lambda-\delta^{*})=\mbox{\boldmath{\mu}}_{{\cal K}}(\lambda)+\delta^{*}{\bf m}(\lambda) and update , , , and the sign of \mbox{\boldmath{\beta}}_{{\cal A}}(\lambda);
- –
If , then stop.
As a special case of (1), we consider the problem (1). Let and be the index satisfying for all and for all . Then, the initializing regularization parameter is and the dual parameter associated with the equality constraint is uniquely given by . The following theorem shows that the solution path in problem (1) is unique and, thus, comlasso pursues the unique solution path.
Theorem 1**.**
If any , the submatrix of has full rank; for , there is the unique solution of the problem (1) and comlasso provides the solution path for .
Proof.
We fix and denote the dual solution of (1) by First, we show that is unique. Let \hat{}\mbox{\boldmath{\beta}} and \tilde{}\mbox{\boldmath{\beta}} be the solution of (1). Additionally, we set as another dual solution and let {\cal A}_{\hat{\mu}}=\{j:|X_{j}^{\top}(y-X\hat{}\mbox{\boldmath{\beta}})+\hat{\mu}|=\lambda\} and {\cal A}_{\tilde{\mu}}=\{l:|X_{l}^{\top}(y-X\tilde{}\mbox{\boldmath{\beta}})+\tilde{\mu}|=\lambda\}. From the strict convexity of the objective function, X\hat{}\mbox{\boldmath{\beta}}=X\tilde{}\mbox{\boldmath{\beta}} and X^{\top}(y-X\hat{}\mbox{\boldmath{\beta}})=X^{\top}(y-X\tilde{}\mbox{\boldmath{\beta}}) (see Lemma 1 in Tibshirani et al., (2013)). The dual function is always concave, so that for all also represent the dual solution. Assuming that is not empty,
[TABLE]
for which is constant. Therefore, must be empty if . However, we know that t\hat{}\mbox{\boldmath{\beta}}+(1-t)\tilde{}\mbox{\boldmath{\beta}} for is also the primal solution of (1), which is a contradiction that the intersection of the active sets of primal solutions \hat{}\mbox{\boldmath{\beta}} and t\hat{}\mbox{\boldmath{\beta}}+(1-t)\tilde{}\mbox{\boldmath{\beta}} must be empty. That is, is unique if is not empty.
From the uniqueness of the dual solution, the corresponding \hat{}\mbox{\boldmath{\beta}} is unique because any submatrix of has full rank (see Theorem 5 in Osborne et al., (2000)). This result implies that comlasso pursues the unique solution of (1) for .
3 Numerical examples
3.1 Toy example analysis
The comlasso algorithm can be applied to the compositional regression model estimation with adaptive lasso penalty (Zou,, 2006) through simple reparametrization. We applied the adaptive lasso to sand data analysis (Aitchison,, 1986) as a toy example of compositional regression. The sand data consist of 39 sediment samples, collected at different water depths. The sediment components are sand, silt, and clay, measured in percentages for each sample. Water depth is the response variable and the others are predictors in the compositional regression. The weights for adaptive penalty are the inverses of non-regularized estimates. We compare the solution paths and the regression coefficients of the lasso and adaptive lasso estimates chosen by Bayesian information criteria (BIC). We set the degree of freedom in BIC as the number of active coefficients subtracted from 1.
Figure 1 shows the solution paths of lasso (left) and adaptive lasso (right). The vertical line denotes the selected model using BIC. The non-regularized estimates corresponding to the predictor, clay, are relatively small. A large penalization on the predictor under adaptive lasso leads different solution paths from that of lasso.
3.2 Speed comparison
Comlasso adopts the idea of LARs Efron et al., (2004) and the piecewise linear toolbox (Rosset and Zhu,, 2007) that pursue the solution path under lasso. Compared with LARs, the computational cost of the comlasso algorithm is not significantly more expensive. Only more Lagrangian variables are used in the linear system to find the solution path directions and a maximum of additional arithmetic computations are required to obtain the step size. Compared with genlasso (Tibshirani and Taylor,, 2011), comlasso has a computational advantage in a high-dimensional regression model. In fact, genlasso was originally developed to solve the regression problem with the full ranked design matrix, so that it employs an additional penalty when . The penalty function is incorporated into the design matrix for the full rank design, which leads to a drastic increase in computational cost, especially when is significantly larger than (see Section 7 in Tibshirani and Taylor, (2011)). However, comlasso does not require the full ranked design matrix, nor suffer from such a computation problem.
We consider an optimization problem with quadratic loss and perform two simulations to compare the computational speeds of genlasso and comlasso. The simulation setting is similar to Lin et al., (2014). Here, for , are generated from the logarithm of the logistic normal distribution with (\mbox{\boldmath{\eta}},\Sigma), where \mbox{\boldmath{\eta}}_{{\cal G}_{k}}=\log(p_{k}/2){\bf 1}_{p_{k}} for , \mbox{\boldmath{\eta}}_{{\cal G}_{k}}={\bf 0}_{p_{k}} for , and is the block diagonal covariance matrix, whose diagonal block is given by for with for . The regression parameters are given by \mbox{\boldmath{\beta}}_{{\cal G}_{1}}=(1,-0.8,0.6,0,0,-1.5,-0.5,1.2,0\cdots,0)^{\top}\in\mathbb{R}^{p_{1}} and \mbox{\boldmath{\beta}}_{{\cal G}_{k}}={\bf 0}_{p_{k}} for , and s are independently generated from
For a fair comparison of computational time, we control for the maximum number of kinks in the solution path. Due to the use of the penalty function in the high-dimensional cases, genlasso produces less sparse solutions with more kinks, even when the degree of freedom achieves maximum in the primal problem. To prevent redundant computations for genlasso, we bind the maximum number of kinks for which comlasso achieves the maximum degree of freedom in the primal problem. Additionally, we do not include the computation time of reparametrization required for genlasso. We use R for the comlasso algorithm (see https://github.com/jenjong/ComLasso) and implement genlasso and comlasso using a PC with 3.8 Ghz CPU and 16 GB memory under a Win 10 operating system. Computation time is measured by 20 repetitions.
In the first simulation, we let and vary and . Table 1 presents the average computation times and standard errors for each and . Comlasso is efficient compared to genlasso, especially when is large. In addition, comlasso is less affected by than genlasso. This may be because the main computational burden in comlasso arises in the matrix inversion to determine direction.
In the second simulation, we fix and vary and . We let for . Table 2 summarizes the simulation results. Comlasso still performs well in the high-dimensional case. As increases, more computations are required for checking the violation of the KKT conditions (7), so that comlasso becomes slightly slower to produce the solution path. Genlasso seems to be faster as increases, that is, the dimension of increases for a fixed .
Remark. (comparison of sparsity) Since genlasso (Tibshirani and Taylor,, 2011) uses the penalty function, its solution is less sparse than the optimal solution. The sparse solution can be obtained by proper thresholding and most calibrated solutions are close to those of comlasso, at least in our experiments. However, there are no studies for the determination of the threshold level in this case.
3.3 Real data analysis
We apply the comlasso algorithm to estimate a classification model that predicts the incidence of bloodstream infections (BSI) in patients. We use the microbiome dataset (Montassier et al.,, 2016), which consists of 11 BSI patients and 17 non-BSI patients and amounts to 3837 operational taxonomic units (OTUs) of microbiomes, that is, pragmatic proxies for microbial “species,” identified by DNA sequencing. The 3837 OTUs belong to one of 111 genera, which in turn belong to 6 phyla. In the analysis, we exclude the two phyla and the corresponding genera and OTUs because they only have a single OTU. As a result, the 3833 OTUs belonging to the 107 genera and 4 phyla are used for further analysis.
We aggregate the amounts of OTUs in each genus and use them as covariates. That is, the data consist of one binary response and 107 genus-level amounts of OTUs. From this dataset, we compose two compositional datasets. The first one is to take the proportions of the 107 genus-level amounts of the OTUs (i.e. and ).
The second dataset considers the proportions between genera belonging to the same phylum. There are four types of phyla (Actinobacteria, Bacteroidetes, Firmicutes, and Proteobacteria), Which include 20, 10, 62, and 15 genera, respectively, and let the index set of genera denoting their phylum levels be for . Therefore, the dimensions of the compositional predictors are , and . The averages of genus-level aggregated OTUs in each phylum are , , , and , respectively, being quite different from each other. Hence, the proportions separately calculated for each phylum in the second dataset are quite different from the proportions among all genera in the first dataset. Note that the compositional regression model for the second dataset is a special case of the compositional regression model for the first dataset with additional constraints on the regression coefficients of each phylum. Therefore, using the proportions in each phylum as compositional predictors can reflect the information about the phylogenetic tree into the classification model.
We use the quadratic hinge loss (see the Appendix) to measure the predictive performance of the estimated models by leave-one-out cross validation. Additionally, we apply the stability selection procedure (Meinshausen and Bühlmann,, 2010) to assess the importance of predictors.
Figure 2 presents the leave-one-out errors of the three models, comlassoA (for the first dataset), comlassoB (for the second dataset,) and the non-regularized model (Montassier et al.,, 2016), where the model is fitted with the selected predictors by the Wilcoxon test in the first dataset. The figure illustrates that comlassoA and comlassoB show lower cross validation errors than the non-regularized model and that comlassoB tends to outperform comlassoA. Additionally, the choice of in the quadratic hinge loss does not affect the results significantly.
Figure 3 draws the entire solution paths for each phylum of comlassoA and comlassoB for . The horizontal axis denotes the value of \|\mbox{\boldmath{\beta}}_{{\cal G}_{k}}(\lambda)\|_{1}/\|\mbox{\boldmath{\beta}}_{{\cal G}_{k}}(\lambda_{\max})\|_{1}. Figure 3 also shows substantial differences among the estimated coefficients of the genera in the Proteobacteria phylum group. Specifically, for a large , that is, for a small \|\mbox{\boldmath{\beta}}(\lambda)\|_{1}/\|\mbox{\boldmath{\beta}}(\lambda_{max})\|_{1}, comlassoB gives more sparse coefficients at the phylum level than comlassoA. As previously pointed out, the average number of OTUs in each genus in the Proteobacteria group is significantly lower than the others. This means that the relative ratios seriously depend on the set where ratios are computed. Therefore, the coefficients corresponding to the Proteobacteria group are activated in an earlier step in comlassoA than comlassoB. This can be explained by a normalization of composite predictors, which reflects the structure of the phylum as prior knowledge of features.
Finally, to evaluate the importance of predictors we use the stability selection procedure (Meinshausen and Bühlmann,, 2010). Specifically, we use 1000 subsamples for half of the samples and fit the randomized lasso estimates with weakness parameter for each subsample. The regularization parameter of the randomized lasso is chosen by five-fold cross validation. Comlasso is applied to implement the randomized lasso algorithm. Table 1 summarizes the 10 genera with the highest selection probabilities in comlassoA and comlassoB, as well as the top-10 genera with the smallest p-values in the Wilcoxon test. Table 1 also shows that Lactobacillus was selected as the most important genus for both comlassoA and comlassoB. The Wilcoxon test also indicates that Lactobacillus is significant in the top-10 rank in terms of -value. Multiple literatures showed Lactobacillus can cause serious infections of the bloodstream (Cannon et al.,, 2005; Salminen et al.,, 2002).
Table 3 indicates that the multiple genus in the Proteobacteria phylum are selected by comlassoA and Wilcoxon test. However, comlassoB does not select any genus included in the phylum within the top-10 ranked predictors. As previously discussed, this selection pattern of comlassoB is partly owed to a normalization in compositional data. Considering predictive performance, we may conclude that it is worth investigating additional biological evidence based on the results of comlassoB under larger samples.
4 Concluding remarks
We developed a solution path algorithm for an regularized regression model with a compositional predictor in the high-dimensional case. Comlasso generalizes the equiangular direction of the LARs on a specific linear space and directly produces the entire solution path under the primal problem. Compared with genlasso, comlasso has the advantage of not only numerical precision but also computational efficiency in the high-dimensional regression model. Moreover, comlasso is easy to be extended to the regularized regression problem with almost quadratic loss, such as the Huberized loss and expectile loss functions. Because comlasso does not require continuously twice differentiability of the loss function, it can provide a new, efficient method for optimizing (1), where the solution path algorithm of Gaines et al., (2018) is not covered.
Additionally, we elaborated a way of monitoring the feasibility of the dual parameters corresponding to the equality constraints. In our case, verifying the condition requires only finite intersections of half intervals. Although we assume a special structure of the considered equality constraint, the proposed method to monitor dual feasibility can be extended to a general linear constraint, in which step size is computed by linear programming.
As an application of comlasso, we used the hinge loss for the classification problem and analyzed the microbiome dataset. We also considered two different types of constraints and compared the predictive performances of the considered models. The numerical results indicate that the regularized models show better performance than the non-regularized one. Additionally, we implemented stability selection to evaluate the importance of predictors.
Appendix A Appendix
A.1 Examples
Example 1**.**
Loss function in regression
- •
Quadratic loss is given by l(y,{\bf x}^{\top}\mbox{\boldmath{\beta}})=r^{2}/2, and, in this case, and .
- •
The asymmetric loss with is given by
[TABLE]
Where the corresponding constants are piecewise defined as for and for (Aigner et al.,, 1976).
- •
The Huberized loss function with a fixed knot is given by
[TABLE]
where the corresponding constants are piecewise defined as for and for (Rosset and Zhu,, 2007).
Example 2**.**
Loss function in classification
- •
The squared hinge loss function is given by l(y,{\bf x}^{\top}\mbox{\boldmath{\beta}})=\big{(}\max(0,1-r)\big{)}^{2}/2, in which for and for (Lee and Lin,, 2013).
- •
The Huberized loss function with a fixed knot is given by
[TABLE]
in which , , for and , , for , , for (Rosset and Zhu,, 2007).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aigner et al., (1976) Aigner, D. J., Amemiya, T., and Poirier, D. J. (1976). On the estimation of production frontiers: maximum likelihood estimation of the parameters of a discontinuous density function. International Economic Review , pages 377–396.
- 2Aitchison, (1986) Aitchison, J. (1986). The Statistical Analysis of Compositional Data . Chapman & Hall, Ltd., London, UK.
- 3Aitchison and Bacon-shone, (1984) Aitchison, J. and Bacon-shone, J. (1984). Log contrast models for experiments with mixtures. Biometrika , 71(2):323–330.
- 4Bondell and Reich, (2009) Bondell, H. D. and Reich, B. J. (2009). Simultaneous factor selection and collapsing levels in anova. Biometrics , 65(1):169–177.
- 5Brodie et al., (2009) Brodie, J., Daubechies, I., De Mol, C., Giannone, D., and Loris, I. (2009). Sparse and stable markowitz portfolios. Proceedings of the National Academy of Sciences , 106(30):12267–12272.
- 6Buccianti et al., (2006) Buccianti, A., Mateu-Figueras, G., and Pawlowsky-Glahn, V. (2006). Compositional data analysis in the geosciences: from theory to practice. Geological Society of London.
- 7Cannon et al., (2005) Cannon, J. P., Lee, T. A., Bolanos, J. T., and Danziger, L. H. (2005). Pathogenic relevance of lactobacillus: a retrospective review of over 200 cases. European Journal of Clinical Microbiology and Infectious Diseases , 24(1):31–40.
- 8Efron et al., (2004) Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression. Ann. Statist. , 32(2):407–499.
