A note on locally optimal designs for generalized linear models with restricted support
Osama Idais

TL;DR
This paper explores methods to derive locally optimal experimental designs for generalized linear models, especially when prior parameter knowledge is limited, by relating models with and without intercepts.
Contribution
It introduces assumptions that connect optimal designs between models with and without intercepts, facilitating design derivation without full prior knowledge.
Findings
Derived locally optimal designs for models with and without intercepts.
Applied methods to Poisson and logistic models.
Extended approaches to nonlinear models.
Abstract
Optimal designs for generalized linear models require a prior knowledge of the regression parameters. At certain values of the parameters we propose particular assumptions which allow to derive a locally optimal design for a model without intercept from a locally optimal design for the corresponding model with intercept and vice versa. Applications to Poisson and logistic models and Extensions to nonlinear models are provided.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A note on locally optimal designs for generalized linear models with restricted support
Osama Idais
Institute for Mathematical Stochastics, Otto-von-Guericke University Magdeburg,
PF 4120, D-39016 Magdeburg, Germany
Abstract
Optimal designs for generalized linear models require a prior knowledge of the regression parameters. At certain values of the parameters we propose particular assumptions which allow to derive a locally optimal design for a model without intercept from a locally optimal design for the corresponding model with intercept and vice versa. Applications to Poisson and logistic models and Extensions to nonlinear models are provided.
keywords:
approximate design, information matrix, model without intercept, optimal design, saturated design.
††journal:
1 Introduction
The generalized linear model, GLM, is a generalization of the ordinary linear regression which allows continuous or discrete observations from one-parameter exponential family distributions to be combined with explanatory variables (factors) via proper link functions (Nelder and Wedderburn (1972)). In GLM framework logistic, probit, Poisson and gamma models are included besides others (McCullagh and Nelder (1989) and Dobson and Barnett (2018)). Therefore, wide applications deal with GLMs such as social and educational sciences, clinical trials, insurance and industry.
The information matrix for a GLM depends on the model parameters. Locally optimal designs under GLMs are derived at a certain value of the parameters (Khuri et al. (2006), Atkinson and Woods (2015)). A possible procedure to overcome the complexity in deriving a locally optimal design for GLMs without intercept is to make use of an available locally optimal design for GLMs with intercept and vice versa. This procedure was suggested in Heiligers and Hilgers (2003) to investigate the relation between optimal designs for mixture and for component amount models. Their result was extended under linear models in Li et al. (2005) to derive a D-optimal design for a non-intercept linear model from that for a linear model with intercept. In contrast, Zhang and Wong (2013) provided specific conditions to derive D- and A-optimals for component amount models (with intercept) from analogous optimal designs for the corresponding mixture models (without intercept). In this paper we generalize their approaches for GLMs under D- and A-criteria and we introduce a more transparent proof based on The General Equivalence Theorem. This paper is organized as follows. In Section 2, the models and design optimality criteria are introduced. In Section 3, we present the main results followed by applications to Poisson and logistic models in Section 4. Further extensions are given in Section 5.
2 Models and designs
Let be independent response variables at experimental conditions which come from an experimental region , i.e., . Under generalized linear models with the vector of model parameters each observation belongs to a one-parameter exponential family distribution with expected mean and variance where is a mean-variance function and is a dispersion parameter (see McCullagh and Nelder (1989), Section 2.2.2). Let be a -dimensional regression function written as . To assure estimability of the parameters the components are assumed to be real-valued continuous linearly independent functions on . The expected mean is related to a linear predictor via a one-to-one and differentiable link function , i.e., , . We can define the intensity function for each point as
[TABLE]
which is positive and depends on the value of linear predictor (Atkinson and Woods (2015)). The Fisher information matrix for a GLM can be given in the form for all (see Fedorov and Leonov (2013), Subsection 1.3.2). Define the function then the Fisher information matrix may rewrite as for each . The latter form is appropriate for other nonlinear models and will appear frequently in the paper. For the whole experimental conditions the Fisher information matrix can be obtained by .
In this article, we focus on approximate designs defined on the experimental region with finite and mutually distinct support points and the corresponding weights such that ( see Silvey (1980), p.15). The set is called the support of . The information matrix of a design at a parameter point is defined by
[TABLE]
Optimal designs derived under specific optimality criteria. Throughout, we restrict to the common D- and A-criteria. Denote by “” and “” the determinant and the trace of a matrix, respectively. A design is called locally D-optimal (at ) if it minimizes \det\bigl{(}\boldsymbol{M}^{-1}(\xi,\boldsymbol{\beta})\bigr{)} over all designs whose information matrix (at ) is nonsingular. Similarly, a design is called locally A-optimal (at ) if it minimizes {\rm tr}\bigl{(}\boldsymbol{M}^{-1}(\xi,\boldsymbol{\beta})\bigr{)} over all designs whose information matrix (at ) is nonsingular. The General Equivalence Theorem can be used to investigate the optimality of a design with respect to D-criterion and A-criterion (see Silvey (1980), p.40, p.48 and p.54). Let be a given parameter point and let be a design with nonsingular information matrix . The design is locally D-optimal (at ) if and only if
[TABLE]
The design is locally A-optimal (at ) if and only if
[TABLE]
Remark**.**
The maximum of inequality (2.3) or (2.4) achieves at the support points of any D- or A-optimal deigns, respectively. The left hand side of each inequality is called the sensitivity function.
3 Main results
In the following we distinguish between the model with an explicit intercept , say and the corresponding model without an explicit intercept , say. We modify our notations and thus these models; and are (with out loss of generality) characterized in the following.
[TABLE]
and . Denote the intensity function by and let . Here we assume there is no constant (intercept) term explicitly involved in the present model, i.e., none of the regression components of the real-valued function is constant equal to . Denote and thus the information matrix of on under model is written as
[TABLE]
The corresponding model is defined by including the constant and the intercept parameter into the linear predictor of the generalized linear model as in the following.
[TABLE]
and . Denote the intensity function by and let . Denote the function . So we can write u^{\frac{1}{2}}(\boldsymbol{x},\boldsymbol{\beta})\big{(}1,\boldsymbol{f}^{\sf T}(\boldsymbol{x})\big{)}^{\sf T}=\big{(}u^{\frac{1}{2}}(\boldsymbol{x},\boldsymbol{\beta}),\boldsymbol{f}^{{\sf T}}_{\boldsymbol{\beta}}(\boldsymbol{x})\big{)}^{\sf T}. Define to be the set of all designs on for model such that and there exist a constant vector such that for all , i.e.,
[TABLE]
Then the information matrix of under model reads as
[TABLE]
In the following we give sufficient conditions under which the locally D- resp. A-optimal design at a parameter point for model can be obtained from the locally D- resp. A-optimal design from at a parameter point for the corresponding model by simply removing the origin point from its support points and renormalizing the weights of the remaining support points and vice versa. To this end, for a design define on to be the conditional measure of given . So we get . Let denotes the one point design supported by the origin point , then we can write . Assume that for a given parameter point we have which yields and with . In particular, let then we find
[TABLE]
where
[TABLE]
Note that the submatrix is the information matrix of for model . Furthermore, where . Since there exist a constant vector such that for all , it is straightforward to verify the following
[TABLE]
As a result we get
[TABLE]
Lemma 3.1**.**
Consider design for model . Let a parameter point be given such that for all and . Then the design is locally D-optimal (at ) if it assigns weight to the origin .
Proof.
Under the assumptions given in the lemma we obtain from (3.1). Then the sensitivity function obtained from condition (2.3) of The Equivalence Theorem is given by
[TABLE]
Since we have \psi(\boldsymbol{0},\xi^{*})=\tilde{u}_{0}\big{(}\omega\tilde{u}_{0}\big{)}^{-1} and according to Remark Remark is locally D-optimal if \tilde{u}_{0}\big{(}\omega\tilde{u}_{0}\big{)}^{-1}=\nu+1 which holds true if . ∎
Theorem 3.1**.**
*Consider design for model . Let the design on be the conditional measure of given . Let a parameter point be given such that for all . Assume that and . Let . Then
(1) If is locally D-optimal (at ) for model then is locally D-optimal (at ) for model .
(2) If is locally D-optimal (at ) for model and*
[TABLE]
then is locally D-optimal (at ) for model .
Proof.
Ad () Let be locally D-optimal (at ) on for model . We want to proof that on is locally D-optimal (at ) for model . By condition (2.3) of The Equivalence Theorem we guarantee at that
[TABLE]
where, at , and for all with . So \boldsymbol{M}^{-1}\big{(}\xi^{*},\boldsymbol{\beta}\big{)}=\boldsymbol{M}^{-1}\big{(}\xi^{*},\boldsymbol{\tilde{\beta}}\big{)} which is given by (3.1) with . Then inequality (3.3) is equivalent to
[TABLE]
Elementary computations show that the above inequality is equivalent to
[TABLE]
and so is locally D-optimal (at ) by condition (2.3) of The Equivalence Theorem.
Ad () Let on is locally D-optimal (at ) for model . Under the assumptions stated in the theorem, to show that from on is locally D-optimal (at ) for model we investigate condition (2.3) of The Equivalence Theorem which is given above by (3.3) and is also equivalent to (3.4) at . Hence, (3.4) holds true by condition (3.2). Of course, because is locally D-optimal inequality (3.2) becomes an equality at each design point of which surely is a design point of and since the equality also holds at the origin point . ∎
Next we introduce analogous result for the A-optimality. As {\rm tr}\big{(}\boldsymbol{c}\boldsymbol{c}^{\sf T}\big{)}=\boldsymbol{c}^{\sf T}\boldsymbol{c} we obtain from (3.1)
[TABLE]
Also from (3.1) we get
[TABLE]
Lemma 3.2**.**
Consider design for model . Let a parameter point be given such that for all and . Denote {\widetilde{\tau}={\rm tr}\bigl{(}\boldsymbol{\tilde{M}}^{-1}(\xi_{-\boldsymbol{0}}^{*},\boldsymbol{\tilde{\beta}})\bigr{)}}. Then the design is locally A-optimal (at ) if it assigns weight , below, to the origin ;
[TABLE]
Proof.
Under the assumptions given in the lemma we obtain from (3.6). Then the sensitivity function obtained from condition (2.4) of The Equivalence Theorem is given by
[TABLE]
Since we have \psi(\boldsymbol{0},\xi^{*})=\tilde{u}_{0}(\boldsymbol{c}^{\sf T}\boldsymbol{c}+1)\big{(}\omega\tilde{u}_{0}\big{)}^{-2} and according to Remark Remark is locally A-optimal if \tilde{u}_{0}(\boldsymbol{c}^{\sf T}\boldsymbol{c}+1)\big{(}\omega\tilde{u}_{0}\big{)}^{-2}={\rm tr}(\boldsymbol{M}^{-1}(\xi^{*},\boldsymbol{\tilde{\beta}})) which holds true if . By (3.5) we get ∎
Theorem 3.2**.**
Consider the assumptions and notations of Theorem 3.1 with {\widetilde{\tau}={\rm tr}\bigl{(}\boldsymbol{\tilde{M}}^{-1}(\xi_{-\boldsymbol{0}}^{*},\boldsymbol{\tilde{\beta}})\bigr{)}}. Let
[TABLE]
Denote the following equations
[TABLE]
*Then
(1) If is locally A-optimal (at ) for model and for all then is locally A-optimal (at ) for model .
(2) If is locally A-optimal (at ) for model and*
[TABLE]
then is locally A-optimal (at ) for model .
Proof.
Ad () Let on be locally A-optimal (at ) for model . We want to proof that on is locally A-optimal (at ) for model . Then condition (2.4) of The Equivalence Theorem guarantees at that for all
[TABLE]
where, at , and for all with . So \boldsymbol{M}^{-2}\big{(}\xi^{*},\boldsymbol{\beta}\big{)}=\boldsymbol{M}^{-2}\big{(}\xi^{*},\boldsymbol{\tilde{\beta}}\big{)} which is given by (3.6) with . Then the l.h.s. of inequality (3.8) equals
[TABLE]
and together with (3.5) it is straightforward to see that (3.8) is equivalent to
[TABLE]
Since for all , (3.9) is equivalent to
[TABLE]
and so is locally A-optimal (at ) by condition (2.4) of The Equivalence Theorem.
Ad () Let on is locally A-optimal (at ) for model . Under the assumptions stated in the theorem to show that from on is locally A-optimal (at ) for model we investigate condition (2.4) of The Equivalence Theorem which is given above by (3.8) and is also equivalent to (3.9) at for all . Hence, it is straightforward to see that (3.9) for all holds true by condition (3.7). Of course, because is locally A-optimal and for all inequality (3.7) becomes an equality at each design point of which surely is a design point of . Since and the equality also holds at the origin point . ∎
Remark**.**
The results of this section might be viewed as a generalization of the results of both Li et al. (2005) and Zhang and Wong (2013) that were derived under linear models, i.e., when the intensities are constants equal to 1.
Remark**.**
A design with minimal support, i.e., the support size equals the dimension of () is called a saturated design. In fact, the assumption for all is equivalent to that for all lies on a hyperplane. Thus every saturated design for generalized linear models without intercept satisfies that assumption. Moreover, the assumption for all is satisfied when is given by the -dimensional unit simplex, i.e., . In such a case the mixture constraint of which is given by entails that .
4 Applications
4.1 Poisson models
We consider a first order Poisson model with . The intensity functions under and are given by
[TABLE]
respectively. It is noted that factorizes; i.e., . Therefore, for any given parameter point . That means the design is independent of and hence, locally optimal designs for a Poisson model with intercept is governed by . Similar situation holds under the Rasch Poisson-Gamma counts model (Graßhoff et al. (2013)) in item response theory and the Rasch Poisson counts model (Graßhoff et al. (2018)).
A relevant work from the literature includes the results of Russell et al. (2009) who derived a locally D-optimal saturated design for a first order Poisson model with intercept on where at . The support is given by and the -dimensional unit vectors with equal weights . So under the assumptions of Theorem 3.1, part (1) with as the -vector of ones, the design on is locally D-optimal at for the corresponding model without intercept.
4.2 Logistic models
Consider a first order logistic model with . The intensity functions under and are given by
[TABLE]
respectively. Note that and at .
In the literature Kabera et al. (2015), Theorem 3.2, provided a three-point locally D-optimal saturated design at , for the two-factor logistics model on the experimental region . The support is given by where is the unique solution for to the equation . Hence, the assumptions of Theorem 3.1, part (1) with are satisfied and hence the design on is locally D-optimal (at ) with equal weights for the corresponding model without intercept.
See also Example 3 in Schmidt and Schwabe (2017) where product type designs are locally D-optimal at for logistic models with intercept.
5 Extensions
The obtained results in Section 3 under generalized linear models might be applicable under another nonlinear models that are defined by
[TABLE]
In this context we define to be the gradient vector of , i.e.,
[TABLE]
The Fisher information matrix at a point is given by . Actually, nonlinear models of form (5.1) were discussed carefully in the literature (see Ford et al. (1989), Atkinson and Haines (1996)). Here, generally, a nonlinear model includes explicitly an intercept term if the function includes the constant (see Schwabe (1995), Li and Balakrishnan (2011), Rodríguez et al. (2015), He (2018)). In Dette et al. (2008) some dose–response nonlinear models with intercept were listed, e.g.,
[TABLE]
The above nonlinear models were also considered in Dette et al. (2010) and locally D-optimal designs on the experiential region were derived under zero intercept, i.e., . The support is given by with equal weights where is obtained analytically.
In analogy to the results derived under GLMs in Section 3 we denote and we can write the Fisher information matrix of on under a non-intercept nonlinear model as , while the Fisher information matrix of on under a nonlinear model with intercept is \boldsymbol{M}(\xi,\boldsymbol{\beta})=\int_{\mathcal{X}}\big{(}1,\boldsymbol{f}_{\boldsymbol{\beta}}^{\sf T}(\boldsymbol{x})\big{)}^{\sf T}\big{(}1,\boldsymbol{f}_{\boldsymbol{\beta}}^{\sf T}(\boldsymbol{x})\big{)}\,\xi(\mathrm{d}\boldsymbol{x}). The following results are immediate.
Corollary 5.1**.**
*Let the design be defined on such that . Let the design on be the conditional measure of given such that . Given a parameter point such that for all with . Then assume there exist a constant vector such that for all . Let . Then
(1) If is locally D-optimal (at ) for model with intercept then is locally D-optimal (at ) for the corresponding model without intercept.
(2) If is locally D-optimal (at ) for model without intercept and*
[TABLE]
then is locally D-optimal (at ) for the corresponding model with intercept.
Corollary 5.2**.**
Under assumptions and notations of Corollary 5.1 with {\widetilde{\tau}={\rm tr}\bigl{(}\boldsymbol{\tilde{M}}^{-1}(\xi_{-\boldsymbol{0}}^{*},\boldsymbol{\tilde{\beta}})\bigr{)}}. Let
[TABLE]
Denote the following equations
[TABLE]
*Then
(1) If is locally A-optimal (at ) for a model with intercept and for all then is locally A-optimal (at ) for the corresponding model without intercept.
(2) If is locally A-optimal (at ) for a model without intercept and*
[TABLE]
then is locally A-optimal (at ) for the corresponding model with intercept.
Remark**.**
In view of the assumptions of the previous corollaries is given by (3.1) where vanishes. That is due to , thus .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Atkinson and Haines (1996) Atkinson, A., Haines, L., 1996. 14 designs for nonlinear and generalized linear models, in: Ghosh, S., Rao, C. (Eds.), Design and Analysis of Experiments. Elsevier, Amsterdam. volume 13 of Handbook of Statistics , pp. 437–475.
- 2Atkinson and Woods (2015) Atkinson, A.C., Woods, D.C., 2015. Designs for generalized linear models, in: Angela Dean, Max Morris, J.S., Bingha, D. (Eds.), Handbook of Design and Analysis of Experiments. Chapman & Hall/CRC Press, Boca Raton, pp. 471–514.
- 3Dette et al. (2008) Dette, H., Bretz, F., Pepelyshev, A., Pinheiro, J., 2008. Optimal designs for dose-finding studies. Journal of the American Statistical Association 103, 1225–1237.
- 4Dette et al. (2010) Dette, H., Kiss, C., Bevanda, M., Bretz, F., 2010. Optimal designs for the emax, log-linear and exponential models. Biometrika 97, 513–518.
- 5Dobson and Barnett (2018) Dobson, A.J., Barnett, A.G., 2018. An Introduction to Generalized Linear Models. Fourth edition ed., CRC press, Boca Raton.
- 6Fedorov and Leonov (2013) Fedorov, V.V., Leonov, S.L., 2013. Optimal Design for Nonlinear Response Models. CRC Press, Boca Raton.
- 7Ford et al. (1989) Ford, I., Titterington, D.M., Kitsos, C.P., 1989. Recent advances in nonlinear experimental design. Technometrics 31, 49–60.
- 8Graßhoff et al. (2013) Graßhoff, U., Holling, H., Schwabe, R., 2013. Optimal design for count data with binary predictors in item response theory, in: Ucinski, D., Atkinson, A.C., Patan, M. (Eds.), m O Da 10-Advances in Model-Oriented Design and Analysis, Springer International Publishing, Heidelberg. pp. 117–124.
