Detection of latent heteroscedasticity and group-based regression effects in linear models via Bayesian model selection
Thomas A. Metzger, Christopher T. Franck

TL;DR
This paper introduces Bayesian model selection methods to detect hidden group structures and heteroscedasticity in linear models, improving the understanding of complex relationships among categorical predictors.
Contribution
It extends the concept of latent grouping factors to general linear models and provides Bayesian approaches to identify hidden structures and interactions.
Findings
Reveals latent group-based heteroscedasticity in data
Detects hidden groupings among categorical predictor levels
Improves model accuracy by accounting for complex structures
Abstract
Standard linear modeling approaches make potentially simplistic assumptions regarding the structure of categorical effects that may obfuscate more complex relationships governing data. For example, recent work focused on the two-way unreplicated layout has shown that hidden groupings among the levels of one categorical predictor frequently interact with the ungrouped factor. We extend the notion of a "latent grouping factor" to linear models in general. The proposed work allows researchers to determine whether an apparent grouping of the levels of a categorical predictor reveals a plausible hidden structure given the observed data. Specifically, we offer Bayesian model selection-based approaches to reveal latent group-based heteroscedasticity, regression effects, and/or interactions. Failure to account for such structures can produce misleading conclusions. Since the presence of latent…
| Class | Parameters |
|---|---|
| I (Null) | |
| II (SLR) | |
| III (ANCOVA) | , , |
| IV (Group-Contracted ANCOVA) | , , |
| V (Interaction ANCOVA) | , , , |
| VI (Group-Interaction ANCOVA) | , , , |
| VII (Heteroscedastic | , , |
| Group-Contracted ANCOVA) | , |
| VIII (Heteroscedastic | , , , |
| Group-Interaction ANCOVA) | , |
| Class | Parameters |
|---|---|
| I (Additive Model) | , , |
| II (Group-by-Column | , , |
| Interaction) | , |
| III (Heteroscedastic | , , |
| Additive) | , |
| IV (Heteroscedastic Group- | , , |
| by-Column Interaction) | , , |
| Class | Parameters |
|---|---|
| I (Additive Model) | , , |
| II (Group-by-Column | , , |
| Interaction) | , |
| III (Heteroscedastic | , , |
| Additive) | , |
| IV (Heteroscedastic Group- | , , |
| by-Column Interaction) | , , |
| Class | Parameters |
|---|---|
| I (Additive Model) | , , |
| II (Group-by-Column | , , |
| Interaction) | , |
| III (Heteroscedastic | , , |
| Additive) | , |
| IV (Heteroscedastic Group- | , , |
| by-Column Interaction) | , , |
| Class | Parameters |
|---|---|
| I (Additive Model) | , , |
| II (Group-by-Column | , , |
| Interaction) | , |
| III (Heteroscedastic | , , |
| Additive) | , |
| IV (Heteroscedastic Group- | , , |
| by-Column Interaction) | , , |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Detection of latent heteroscedasticity and group-based regression effects in linear models via Bayesian model selection
Thomas A. Metzger
Department of Statistics, Virginia Tech
and
Christopher T. Franck
Department of Statistics, Virginia Tech
Abstract
Standard linear modeling approaches make potentially simplistic assumptions regarding the structure of categorical effects that may obfuscate more complex relationships governing data. For example, recent work focused on the two-way unreplicated layout has shown that hidden groupings among the levels of one categorical predictor frequently interact with the ungrouped factor. We extend the notion of a “latent grouping factor” to linear models in general. The proposed work allows researchers to determine whether an apparent grouping of the levels of a categorical predictor reveals a plausible hidden structure given the observed data. Specifically, we offer Bayesian model selection-based approaches to reveal latent group-based heteroscedasticity, regression effects, and/or interactions. Failure to account for such structures can produce misleading conclusions. Since the presence of latent group structures is frequently unknown a priori to the researcher, we use fractional Bayes factor methods and mixture -priors to overcome lack of prior information.
Keywords: fractional Bayes factor, mixture g-priors, model selection, hidden additivity, latent grouping
1 Introduction
Linear models with categorical predictors are among the most frequently used statistical models, but oversimplification of the variance or regression effect structures can misrepresent key relationships within observed data. Figure 1 shows three relevant data sets. First, the left panel shows a simulated one-way ANOVA experiment, based on the statistics reported in Welch, (1951). The horizontal axis represents the levels of a treatment factor, and the vertical axis represents a continuous response. The center panel is an analysis of covariance (ANCOVA) that analyzes the breaking strength of a starch chip (Flurry,, 1939). The horizontal axis represents the chip’s thickness in inches, the vertical axis represents the breaking strength in grams, and the point shapes represent the plant from which the starch was derived. Finally, the right panel is an unreplicated two-way layout, measuring the genomic hybridization signal in dogs with lymphoma (Franck et al.,, 2013). The horizontal axis represents whether the sample was taken from normal or tumor tissue, the vertical axis represents the intensity of the genomic hybridization signal, and the individual lines represent six dogs studied. In each case, the levels of the categorical predictor appear to fall into one of two groups, although the group structure is unknown before data collection. The apparent group structure for the data in Figure 1 is represented by dark and light gray.
Situations similar to those illustrated in Figure 1 often arise in research. Perhaps after plotting the data or in reviewing previous related work, the researcher begins to suspect that there is a hidden grouping within the levels of the categorical predictor. We call this predictor the suspected latent grouping factor, or SLGF. This work aims to determine whether the latent groupings are plausible and how the group structure affects the response.
We must consider the SLGF with two key ideas in mind: first, that the SLGF might manifest itself through one of several structures in the data, namely, (i) group-specific regression effects, (ii) group-specific variances, and/or (iii) hidden interactions between groups and other model predictors. There are eight possible combinations of structures. Second, the specific assignment of levels to groups is unknown. Both of these aspects of the SLGF must be learned from the data in an unsupervised fashion: specifically, our proposed method uses Bayesian model selection to assess the plausibility of the SLGF’s impact on the data based on posterior probability. Formal detail can be found in Section 2.
Common linear model assumptions include homoscedasticity and a unique effect of the response at each level of categorical predictors; for a thorough review see Berry, (1993). Violations of these assumptions can have a wide variety of negative consequences on inference; see Rencher and Schaalje, (2008), Deschamps, (1991) and Scheffé, (1959). Model misspecification can lead to additional problems; see Rencher and Schaalje, (2008), Rao, (1971), and Deegan, (1976).
Regarding the detection of heteroscedasticity, many analyses follow a two-stage approach. These include the methods proposed by Bartlett, (1937), Levene, (1960), Brown and Forsythe, (1974), and Hartley, (1950). When heteroscedasticity is believed to be a function of a continuous predictor, many methods are available, including Breusch and Pagan, (1979), Cook and Weisberg, (1983), White, (1980), Glejser, (1969), Park, (1966), Box and Hill, (1974), Bickel, (1978), Jobson and Fuller, (1980), and Carroll and Ruppert, (1982).
Regarding inference in the presence of heteroscedasticity, methods include those of Box and Cox, (1964), Carroll and Ruppert, (1988), Perthes, (1855); Morrison, (1983), Huber, (1967), Eicker, (1967), Cragg, (1983), Hildreth and Houck, (1968), Long and Ervin, (2000), Polasek et al., (1998), Polasek and Pötzelberger, (1994), Cuervo and Achcar, (2009), Boscardin and Gelman, (1994), MacKinnon and White, (1985), Dumitrascu et al., (2015), and White, (1980); for a review see Hayes and Cai, (2007) and Cribari-Neto and Zarkos, (1999). Many methods exist to detect heteroscedasticity or conduct inference in its presence. To our knowledge, ours is the first proposal of latent group-based heteroscedasticity alongside possibly unique regression effects and/or hidden interactions.
While an extension to more than two groups is natural, our choice of two groups is still a reasonable approach in many problems; Kharrati-Kopaei and Sadooghi-Alvandi, (2007) and Franck, (2018) study factor level groupings based on two groups in unreplicated two-way layouts, while Goldfeld and Quandt, (1965) model heteroscedasticity as a function of two groups, where groups are created by partitioning observations ordinally.
The main contribution of this work is to propose a method of probabilistically detecting the presence of hidden categorical level groupings, and to describe the model specifications that capture the effect of such groupings, through the use of Bayesian model selection. This work generalizes the notion of latent group-based effects from the two-way layout to linear models; see Kharrati-Kopaei and Sadooghi-Alvandi, (2007); Franck et al., (2013); Franck and Osborne, (2016), and Franck, (2018). There is no unified Bayesian model selection approach to account for these structures in the context of latent groupings of the levels of a categorical predictor in general. Although we illustrate our method in the contexts of the three specific settings shown in Figure 1, our proposal is flexible enough to be used in the context of any linear model with a categorical predictor.
The remaining structure of this paper is as follows. Section 2 describes the candidate models in the context of the SLGF, as well as the fractional Bayes factor and Bayesian model selection details. Section 3 describes our proposed Bayesian model specification in contexts of ANCOVA models and unreplicated two-way layouts. Section 4 describes a simulation study to assess the performance of our method. Section 5 applies our proposed method to the empirical data sets of Figure 1. Section 6 summarizes the proposed method and provides some additional comments. Additional mathematical details and simulation results for one-way ANOVA data are provided in the supplement.
2 Proposed Method
2.1 Specification of Linear Models with Categorical Predictors
We begin the development of our approach by elucidating the assignment of the levels of the SLGF into groups. As an illustration, consider the data analyzed by Franck et al., (2013), shown in the right panel of Figure 1: an unreplicated two-way layout with 6 rows and 2 columns. We choose the dogs as the SLGF. Three examples of possible SLGF level assignments are shown in Figure 2:
For a SLGF with levels , let . We formally define a grouping scheme, denoted , as a partitioning that assigns the data into two groups based on each observation’s corresponding level of the SLGF. Let be the set of all possible schemes, and index the possible schemes. We refer to the example scheme 3, in the rightmost panel of Figure 2, to motivate the subsequent notational definitions; this example partitions levels k=\1, 4, and 5 separately from 2, 3, and 6. For a scheme , a colon separates the levels each group comprises; for example, the scheme of example 3 is denoted as . The most effective method to partition the levels of the SLGF into groups depends on the size and nature of the study in question. Many problems have small enough that a combinatoric search over all possible grouping schemes is reasonable; see Franck et al., (2013), Franck and Osborne, (2016), Franck, (2018), and Kharrati-Kopaei and Sadooghi-Alvandi, (2007). We use the combinatoric search approach exclusively in this study.
Next we formalize the idea of model structures in the context of specific linear models. Recall from Section 1 that we must potentially accommodate eight model structures containing a mix of group-based regression effects, group-based variances, and group-based interactions. These structures must be tailored to both the researcher’s suspicion and goals, as well as the data layout under consideration. Thus we propose the model class, which is the set of models representing a particular structure. While structures reflect the presence or absence of group-based effects, classes prescribe specific corresponding models in the context of the data layout in question.
Now that the grouping schemes and model classes have been enumerated, consider a linear model with centered observations :
[TABLE]
with model matrix parametrized to be full column rank, regression effects , and errors with covariance matrix . To account for the eight possible model structures previously described, we will partition and . Partition into four components: let represent an intercept common to all models, let represent the SLGF with levels, let contain other regression effects, categorical or continuous, and, let represent interactions with the SLGF. Then . Similarly, partition the model matrix into three matrices corresponding to the data related to , , and , respectively. Thus we can express (1) equivalently as
[TABLE]
In cases where the effect structure for one of the terms in Equation (2) depends on a latent grouping scheme, denote that term with a tilde. For example, in structures with group-based interactions, model a group-based interaction instead of usual the interaction . Similarly, heteroscedastic structures with group-based variances are modeled with error vector and corresponding covariance matrix instead of the homoscedastic counterparts and .
When a model contains a group structure, we arrange the observations within to first contain the observations corresponding to , followed by the observations corresponding to , denoted . We similarly arrange the rows and columns of to contain the corresponding observations, which helps concisely express our proposed forms of heteroscedasticity in . For structures with distinct regression effects by group, arrange , and for classes with distinct variances by group, partition the error vector and covariance matrix such that where \tilde{\Sigma}=\left(\begin{array}[]{c|c}\sigma^{2}_{1}I_{n_{1}\times n_{1}}&0_{n_{1}\times n_{2}}\\ \hline\cr 0_{n_{2}\times n_{1}}&\sigma^{2}_{2}I_{n_{2}\times n_{2}}\end{array}\right)_{N\times N}. Notice , for the effects corresponding to groups and , respectively.
With multiple schemes, classes, and structures under consideration, we next propose a Bayesian model selection approach to assess whether latent groups exist within the data, and if so, identify the appropriate grouping scheme (if present) and class .
2.2 Bayesian Model Selection Details
2.2.1 Model Specification
Denote the set of all candidate models , indexed over all possible schemes and classes , where ; to ease the notational burden, we have denoted as in the subscript of . Although each model matrix and estimators for and depend on the grouping and class under consideration, for notational simplicity we do not index , , or by , , or . Let the vector contain the single precision under homoscedastic models, and the corresponding subgroup precisions and under heteroscedastic models. Denote the precision matrix and , the set of unknown model parameters; then we can express (1) conditionally as
[TABLE]
for a given scheme and class . Thus the likelihood function is given by
[TABLE]
We consider two common prior specifications on the regression effects and precision(s). In both cases, we prefer noninformative priors on the precision(s) because prior information on precision is rarely available. First, we consider a noninformative approach, where we have
[TABLE]
[TABLE]
Next we consider the Zellner-Siow mixture -prior (Zellner and Siow,, 1980; Zellner,, 1986; Liang et al.,, 2008), where
[TABLE]
[TABLE]
[TABLE]
[TABLE]
where . We use the Zellner-Siow mixture -prior in relatively data-poor situations, such as unreplicated two-way layouts, to reduce the dimensionality of the parameter space with improper priors. This consequently lowers the minimal training sample size, a critical component of the fractional Bayes factor approach described in Section 2.2.3. We use the noninformative flat prior where data are more abundant; see Section 3.2 for more detail.
2.2.2 Model Priors for Classes and Schemes
We impose a uniform model prior by model class: . Depending on the data layout and model structures under consideration, various classes may contain different numbers of models; thus we subsequently divide each class’s prior uniformly among the models it contains. For example, in an ANOVA layout, one model class might represent the single mean model, where ; note we do not index this model by as there is no grouping structure present in this class. Alternatively, for a model class containing models with distinct regression effects by grouping scheme, the individual models within the class would be given the prior . By implementing Bayes’ Theorem, posterior model probabilities approximated via the fractional Bayes factor (see Section 2.2.3) can then be easily computed for each individual model and class with the marginal probabilities of each model. For a given model we compute:
[TABLE]
We can then aggregate the overall probability of a given class as , or for a given grouping scheme as ; hence the proposed method allows researchers to draw probabilistic conclusions about both structures and specific grouping schemes.
2.2.3 Fractional Bayes Factor Approach
We next motivate the need to use a fractional Bayes factor approach. Consider a comparison between arbitrary models and , where and represent homoscedastic and heteroscedastic models, respectively, via . For we use , and for we use for arbitrary constants . So the Bayes factor to compare and is
[TABLE]
which is defined only up to the arbitrary constant and thus inappropriate for use in model comparison. Note this problem arises from the use of noninformative priors on the precisions when comparing homoscedastic and heteroscedastic models. Fractional Bayes factors were developed to elicit a cancellation of this constant, rendering a Bayes factor that is well-defined (O’Hagan,, 1995).
For a description of fractional Bayes factors (FBFs), see O’Hagan, (1995) and O’Hagan, (1997). We use the FBF approach of O’Hagan rather than the intrinsic Bayes factor of Berger and Pericchi, (1996), as the need to choose a training sample would be complicated by the potential scarcity of data induced by some clustering schemes.
To fully quantify the fractional marginal probability of a given model , we must compute both and for some user-chosen fractional exponent . O’Hagan, (1995) provides several recommendations for , including , where is the minimal training sample size necessary for to be proper. In this study we choose , which elicits consistent model selection (O’Hagan,, 1995). In many cases, tractable expressions exist for both and . When these integrals are intractable, we use the Laplace approximation in the computation of both and , as cubature-based approximation is computationally expensive and our study of the integrand surface indicates a suitable shape. See online Supplement for justification and further detail.
3 Applications
3.1 Application 1: ANCOVA Models
We first consider an ANCOVA scenario with continuous effect and where the SLGF with levels corresponds to the single categorical predictor effect . We let contain the observed continuous covariate and be the appropriate categorical effect design matrix. We consider models with and without the interaction effect , which governs whether the linear trends share a common slope. Thus we begin with the model given by
[TABLE]
yielding the likelihood function
[TABLE]
To consider the model without an interaction, let . For models with a group effect, let ; for models with a group-by-continuous predictor interaction, let ; and finally for heteroscedastic models, let . As an illustration, we consider eight distinct model classes.
Class I (): the "null" model, with no categorical or continuous covariate effects and homoscedastic error variance, contains 1 model with no grouping schemes;
2. 2.
Class II (): the "simple linear regression (SLR)" model, with a continuous covariate effect only and homoscedastic error variance, contains 1 model with no grouping schemes;
3. 3.
Class III (): the "ANCOVA" model with categorical and continuous covariate effects and homoscedastic error variance, contains 1 model with no grouping schemes;
4. 4.
Class IV (): the "group-contracted ANCOVA" model with group and continuous covariate effects and homoscedastic error variance, contains schemes;
5. 5.
Class V (): the "interaction ANCOVA" model with categorical and continuous covariate effects, level-based interaction, and homoscedastic error variance, contains 1 model with no grouping scheme; 6. 6.
Class VI (): the "group-interaction" model, with group and continuous covariate effects, group-based interaction, and homoscedastic error variance, contains schemes;
7. 7.
Class VII (): the "heteroscedastic group-contracted" model (heteroscedastic Class IV), with group and continuous covariate effects and heteroscedastic error variance, contains schemes;
\bm{Y}=\bm{1}^{T}\alpha+W\tilde{\boldsymbol{\nu}}+V\boldsymbol{\tau}+\boldsymbol{\tilde{\varepsilon}},\ \boldsymbol{\tilde{\varepsilon}}\sim N\left(\bm{0},\,\tilde{\Sigma}=\left[\begin{array}[]{c|c}\sigma_{1}^{2}I_{n_{1}\times n_{1}}&0_{n_{1}\times n_{2}}\\ \hline\cr 0_{n_{2}\times n_{1}}&\sigma_{2}^{2}I_{n_{2}\times n_{2}}\end{array}\right]_{N\times N}\right) 8. 8.
Class VIII (): the "heteroscedastic group-interaction" model (heteroscedastic Class VI), with group and continuous covariate effects, group-based interaction, and heteroscedastic error variance, contains schemes;
\bm{Y}=\bm{1}^{T}\alpha+W\tilde{\boldsymbol{\nu}}+V\boldsymbol{\tau}+U\tilde{\boldsymbol{\rho}}+\tilde{\boldsymbol{\varepsilon}},\ \boldsymbol{\tilde{\varepsilon}}\sim N\left(\bm{0},\,\tilde{\Sigma}=\left[\begin{array}[]{c|c}\sigma_{1}^{2}I_{n_{1}\times n_{1}}&0_{n_{1}\times n_{2}}\\ \hline\cr 0_{n_{2}\times n_{1}}&\sigma_{2}^{2}I_{n_{2}\times n_{2}}\end{array}\right]_{N\times N}\right)
Thus the structure of no group-based effects is represented by Classes I, II, III, and V; group-based regression effects are represented by Classes IV, VI, VII, and VIII; group-based variances are represented by Classes VII and VIII.
We use the FBF approach outlined in Section 2.2.3. Typically, the most complex model considered in an ANCOVA study will have a small minimal training sample size relative to the overall sample size. Thus we choose noninformative priors on the regression effects and precision(s) as described in Equations (5) and (6), so the marginal density of the data conditional on the model is
[TABLE]
This marginal density is analytically integrable over all of the homoscedastic classes defined previously. In heteroscedastic cases (classes VII and VIII), we use a Laplace approximation over the log-variances to approximate and . We demonstrate the performance of this approach through a simulation study in Section 4.1 and through empirical data sets in Section 5.1.
3.2 Application 2: Unreplicated Two-Way Layouts
Next we consider an unreplicated two-way layout with rows, columns, and observations. Because of the unreplicated nature of such a design, the full set of standard interaction effects cannot be incorporated due to insufficient degrees of freedom. Treat the row effects as the SLGF , and let contain the column effects; note by transposing the data table we could treat the column effects as the SLGF as well.
Our choice of model classes is based on the idea of hidden additivity, where interactions are treated as a group-by-column effect (Franck et al.,, 2013). The usual “additive" model accounts for only row and column main effects. The group-based model, partitioned by levels of the row effect, does not include column effects but does consider group-by-column interactions, denoted by . We require at least 2 levels of (rows) in both groups to ensure there are enough degrees of freedom to estimate this group-by-column interaction. We thus begin with the model
[TABLE]
where we let in the additive model, in cases with scheme-based regression effects, and in cases with group-based heteroscedasticity. So the full likelihood function is given by
[TABLE]
In this layout we consider four model classes:
Class I (: the "additive" model, where column effects are equivalent across rows and error variance is constant across all observations, contains 1 model with no grouping schemes;
2. 2.
Class II (: the "group-by-column interaction" model, where levels are divided into two latent groups based on grouping scheme with distinct means and equivalent variance, contains grouping schemes (Franck et al.,, 2013);
3. 3.
Class III (: the "heteroscedastic additive" model, where levels are divided into two latent groups based on grouping scheme with equivalent means and group-based variances, contains grouping schemes;
\bm{Y}=\bm{1}^{T}\alpha+W\boldsymbol{\nu}+V\boldsymbol{\tau}+\tilde{\boldsymbol{\varepsilon}},\ \tilde{\boldsymbol{\varepsilon}}\sim N\left(\bm{0},\,\tilde{\Sigma}=\left[\begin{array}[]{c|c}\sigma_{1}^{2}I_{n_{1}\times n_{1}}&0_{n_{1}\times n_{2}}\\ \hline\cr 0_{n_{2}\times n_{1}}&\sigma_{2}^{2}I_{n_{2}\times n_{2}}\end{array}\right]_{N\times N}\right) 4. 4.
Class IV (: the "heteroscedastic group-by-column interaction" model where levels are divided into two latent groups based on grouping scheme with distinct means and group-based variances, contains grouping schemes;
\bm{Y}=\bm{1}^{T}\alpha+W\boldsymbol{\nu}+U\tilde{\boldsymbol{\rho}}+\tilde{\boldsymbol{\varepsilon}},\ \tilde{\boldsymbol{\varepsilon}}\sim N\left(\bm{0},\,\tilde{\Sigma}=\left[\begin{array}[]{c|c}\sigma_{1}^{2}I_{n_{1}\times n_{1}}&0_{n_{1}\times n_{2}}\\ \hline\cr 0_{n_{2}\times n_{1}}&\sigma_{2}^{2}I_{n_{2}\times n_{2}}\end{array}\right]_{N\times N}\right)
Thus there are models considered.
3.2.1 Bayesian Model Specification: Unreplicated Two-Way Layouts
The two-way unreplicated layout is typically modeled with a high ratio of parameters to data points. Thus we must take care in choosing priors that allow us to successfully incorporate the FBF approach. With noninformative priors on the regression effects, the minimal training sample size needed to estimate regression effects, group-by-column interactions, and error variances would be prohibitively large in relation to the sample size ; indeed, only when . In this work, we propose using the Zellner-Siow mixture -prior on regression coefficients to reduce the dimensionality of the improper prior. Our use of this automatic prior on the regression effects lowers the minimal training sample size to , and thus the fractional exponent , to a value that allows us to successfully implement the FBF. Thus we let
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
The marginal density of the data conditional on the model is
[TABLE]
It is well-known in the homoscedastic case that the Zellner-Siow mixture -prior advantageously elicits a Cauchy distribution on the regression effects (Liang et al.,, 2008). We show the Cauchy result also holds in the heteroscedastic case; that is, ; see online Supplement for proof.
In classes with homoscedasticity, this intergral is intractable over and thus a Laplace approximation is conducted over a single dimension; in heteroscedastic cases, a three-dimensional Laplace approximation is used to integrate , , and ).
4 Simulation Studies
4.1 Simulation Study: ANCOVA Models
In order to simulate ANCOVA data, we generated independent draws uniformly over the interval . Outcomes were then simulated according to each of the eight classes described in Section 3.1 with continuous, categorical, interaction, and/or group-based effects, as well as errors with group-based heteroscedasticity, as appropriate. In this study, we simulated observations at each level of the SLGF; another simulation study with is provided in the Supplement. We note that the settings in Table 1 are not calibrated to be equivalent in the total effect between classes, where such a calibration would be non-trivial. For example, the Class V model contains eight non-null parameters compared to the Class I model’s single parameter; thus we avoid comparing overall performance between classes in this study.
Figure 8 shows that overall our method accurately attributes posterior model probability to the correct model class for the eight classes described in Section 3.1. For Class I (null model) data, the posterior probability of the true class is high relative to the other classes, indicating that our method neither misattributes noise to a group-based structure, nor appears to systematically favor one of the other model classes when it does select the wrong model. Class II (SLR model) data performs similarly, but Classes IV (group-contracted ANCOVA) and VII (group-interaction) capture some posterior probability as well. The Class III (ANCOVA) data setting is favored by the correct class, with Class V (interaction ANCOVA) being the second choice. Class IV (group-contracted ANCOVA) data appears to be the most difficult to detect based on the parameters chosen for this study; although its posterior probabilities are generally higher than the other classes’, it is often mistaken for Class III data, meaning in this case a group-based categorical effect is often mistaken for the full categorical effect. Note that these misattributed classes differ by only two parameters in the regression structure. For Class V (interaction ANCOVA) data, the correct model class is favored. Class VI (group-interaction) data is also correctly favored the majority of the time; when misclassified, it is usually chosen as Class V (interaction ANCOVA) or VIII (heteroscedastic ANCOVA). Classes VII (heteroscedastic group-contracted) and VIII (heteroscedastic group-interaction) perform well; when wrong, Class VII tends to favor a spurious interaction effect.
These results indicate that, in general, our method tends to identify the correct class. When no group-based structure is present, our method tends to not erroneously fit spurious effects or variances. Thus group-based regression effects, interactions, and/or variances are detected with high accuracy when present. We see similar performance with the additional study provided in the online Supplement where .
4.2 Simulation Study: Unreplicated Two-Way Layouts
Row and column effects along with error variance(s) (provided in Table 2) were simulated to generate unreplicated two-way layouts. We consider layouts of size where . Another setting with a smaller effect size, and a study on layouts of size , are given in the Supplement.
Figure 4 shows that our method also tends to favor the true class in the two-way unreplicated layout, with high parameter to data ratio and the mixture -prior. For additive data from Class I, the true class is generally favored; the most probable alternative is typically the heteroscedastic additive Class III. Under large effect size, Class II (group-by-column interaction) data is correctly identified. Class III (heteroscedastic additive) and Class IV (heteroscedastic group-by-column interaction) data are both favored correctly as well regarding both effect sizes.
5 Case Studies
5.1 Case Study: ANCOVA
We revisit the Flurry, (1939) data, shown in the center panel of Figure 1 and in Figure 5. The breaking strength of a chip coated with a film derived from one of three plant materials is studied as a function of a continuous predictor (the thickness of the film) and a categorical predictor (the plant type from which the film was developed). The plot seems to indicate some degree of heteroscedasticity between canna and corn, versus potato; results are shown in Figure 5. Two models receive 93% of the posterior probability: the heteroscedastic group-contracted model, with , and the heteroscedastic group-interaction model, with .
5.2 Case Study: Two-Way Unreplicated Layouts
Franck et al., (2013) and Franck and Osborne, (2016) examine a two-way unreplicated layout describing the copy number variation for genes in six dogs with lymphoma. Samples were taken from healthy and diseased tissue within each dog. It is clear in the plot that dogs behave differently by group: dogs 1, 2, and 5 appear to behave distinctly from dogs 3, 4, and 6. With six subjects by row, there are candidate models, including the null model and three classes each containing 25 models. Thus we show only the six most probable models, which account for 98.1% of the posterior probability; these results are shown in Figure 6. We conclude with high probability that subjects 1, 2, and 5 behave distinctly from 3, 4, and 6; this scheme over all classes has probability . More specifically, we also conclude homoscedastic behavior along with this grouping scheme, the group-by-column interaction model, with .
6 Discussion
Our proposed method is a flexible and intuitive approach to accommodate linear models with latent group-based effects underlying the data. This method generalizes the homoscedastic, two-way unreplicated layout work of Franck, (2018) to a heteroscedastic approach to any linear model with a categorical predictor. By partitioning the data according to the levels of a categorical predictor, we can detect latent structures within the data that might influence the regression effects, error variance, or interaction effects. Often, these group-based structures have straightforward and intuitive interpretations in the context of the data, making the approach particularly useful to domain experts. The use of fractional Bayes factors allows us to compare homoscedastic and heteroscedastic models with minimal prior influence. Regarding priors on the regression effects, we explored the performance of noninformative and mixture -priors in the contexts of ANCOVA and two-way unreplicated layouts, but many other choices on priors and data layouts can be considered. We considered cases in which the number of levels of the SLGF was relatively small, so that a combinatoric search over all possible grouping schemes was computationally feasible. In cases where is large, a Markov-chain model composition (MC3) could be used to search the model space.
In some cases, our method leads to a contraction of the number of effects modeled. For instance, in the ANCOVA examples illustrated in Section 3.1, estimating a group effect as opposed to a categorical effect will reduce the degrees of freedom used to estimate the effect from to . Alternatively, in the two-way unreplicated layout examples, modeling a group-by-column interaction rather than column effects will expand the degrees of freedom used from to .
While we have illustrated our method using two latent groups, an extension to three or more groups is straightforward conceptually but increases the complexity of the models, the number of model classes, and the number of models to consider. Fortunately, the two-group assumption has been shown to lead to useful inferences in several previous works, including Kharrati-Kopaei and Sadooghi-Alvandi, (2007), Franck et al., (2013), Franck and Osborne, (2016), and Franck, (2018).
7 Online Supplement
7.1 Marginal Model Probability Calculations
To prevent numeric underflow we consider model probabilites on the logarithmic scale in all cases. Consider the set of all log-marginal probabilities . Let and . Transform to obtain the set , representing Bayes factors for each model relative to the most probable model. Note the untransformed model probabilities are given by ; thus we can easily use these transformed log-marginal probabilities to obtain posterior model probabilities. Our Laplace approximations generally provided values comparable to quadrature and cubature-based approximations at less computational expense.
7.2 Derivation of Model Probabilities: Noninformative Regression Effect Priors
Let .
7.2.1 Homoscedastic Error Variance, No Group-Based Regression Effects
[TABLE]
[TABLE]
7.2.2 Homoscedastic Error Variance, Group-Based Regression Effects
[TABLE]
[TABLE]
7.2.3 Heteroscedastic Error Variance
A Laplace approximation is used to evaluate , parametrized with respect to the log-variances and . Denote as the log-variance matrix, as the transformation Jacobian, as the mode of the log-marginal distribution, and as the Hessian evaluated at this mode. A subscript of refers to the same quantities calculated with respect to the fractional exponentiated likelihood. The joint modes and are computed using the function optim in R; similarly, the Hessians and are evaluated at these values using the function hessian in the package numderiv (Gilbert and Varadhan,, 2016; R Core Team,, 2017).
We first integrate over the regression effects vector . Let .
[TABLE]
We reparametrize the precisions of and to log-variances and to elicit a shape more conducive to the Laplace approximation.
For and , where and , so .
Thus the Laplace approximation for the fractional marginal model probability is given by
[TABLE]
Empirical comparisons with cubature-based approximations, in addition to the shapes of the marginal densities and , indicate that this Laplace approximation is satisfactory.
7.2.4 Special Case: Heteroscedastic Error Variance with no Regression Effects Spanning Two Variances
[TABLE]
[TABLE]
7.3 Derivation of Model Probabilities: Mixture Regression Effect Priors
[TABLE]
After integrating over , and , we obtain
[TABLE]
To execute the Laplace approximation over , we must obtain the mode , the root of the equation
[TABLE]
where . We also require the Hessian evaluated at the mode,
[TABLE]
These expressions are appropriate to use in homoscedastic classes with either global or distinct regression effects; we simply compute the corresponding and use the appropriate based on the model under consideration.
In classes with heteroscedasticity, the integral is intractable over both and ; thus we must employ a three-dimensional Laplace approximation to evaluate this integral. For computational ease and to improve the accuracy of the approximation, we again parametrize with respect to the log-variance; let represent the log-variance matrix. Denote ; then integrating out the global intercept and regression effects yields an expression for :
[TABLE]
The joint mode is computed using the function optim in R; similarly, the Hessian is computed at this value using the function hessian in the package numderiv.
7.3.1 Heteroscedastic Zellner-Siow Cauchy Result
Let and . Then and .
[TABLE]
7.4 Supplemental ANCOVA Simulation Study
We let for each level of the SLGF. The parameter settings are provided in Table 1.
7.5 Supplemental Two-way Layout Simulation Study
We provide three additional simulation studies in the twoway layout scenario: layouts with a smaller effect size than the study provided in Section 4.2, as well as studies with larger and smaller effect sizes.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bartlett, (1937) Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Statistical Society, Series A , 160(901):268–282.
- 2Berger and Pericchi, (1996) Berger, J. O. and Pericchi, L. R. (1996). The intrinsic bayes factor for model selection and prediction. Journal of the American Statistical Association , 91(433):109–122.
- 3Berry, (1993) Berry, W. D. (1993). Quantitative Applications in the Social Sciences: Understanding Regression Assumptions . SAGE Publications Ltd., Thousand Oaks, CA.
- 4Bickel, (1978) Bickel, P. J. (1978). Using residuals robustly i: Tests for heteroscedasticity, nonlinearity. Annals of Statistics , 6:266–291.
- 5Boscardin and Gelman, (1994) Boscardin, W. J. and Gelman, A. (1994). Bayesian computation for parametric models of heteroscedasticity in the linear model.
- 6Box and Cox, (1964) Box, G. E. and Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society , 26:211–252.
- 7Box and Hill, (1974) Box, G. E. P. and Hill, W. J. (1974). Correcting inhomogeneity of variance with power transformation weighting. Technometrics , 16:385–389.
- 8Breusch and Pagan, (1979) Breusch, T. S. and Pagan, A. R. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica (pre-1986) , 47(5):1287. Copyright - Copyright Econometric Society Sep 1979; Last updated - 2011-10-07; CODEN - ECMTA 7.
