Detection of latent heteroscedasticity and group-based regression   effects in linear models via Bayesian model selection

Thomas A. Metzger; Christopher T. Franck

arXiv:1903.01035·stat.ME·March 5, 2019·Technometrics

Detection of latent heteroscedasticity and group-based regression effects in linear models via Bayesian model selection

Thomas A. Metzger, Christopher T. Franck

PDF

TL;DR

This paper introduces Bayesian model selection methods to detect hidden group structures and heteroscedasticity in linear models, improving the understanding of complex relationships among categorical predictors.

Contribution

It extends the concept of latent grouping factors to general linear models and provides Bayesian approaches to identify hidden structures and interactions.

Findings

01

Reveals latent group-based heteroscedasticity in data

02

Detects hidden groupings among categorical predictor levels

03

Improves model accuracy by accounting for complex structures

Abstract

Standard linear modeling approaches make potentially simplistic assumptions regarding the structure of categorical effects that may obfuscate more complex relationships governing data. For example, recent work focused on the two-way unreplicated layout has shown that hidden groupings among the levels of one categorical predictor frequently interact with the ungrouped factor. We extend the notion of a "latent grouping factor" to linear models in general. The proposed work allows researchers to determine whether an apparent grouping of the levels of a categorical predictor reveals a plausible hidden structure given the observed data. Specifically, we offer Bayesian model selection-based approaches to reveal latent group-based heteroscedasticity, regression effects, and/or interactions. Failure to account for such structures can produce misleading conclusions. Since the presence of latent…

Tables5

Table 1. Table 1: Settings for the eight model classes in the ANCOVA simulation study where α = 0 𝛼 0 \alpha=0 , K = 4 𝐾 4 K=4 , and N = 360 𝑁 360 N=360 .

Class	Parameters
I (Null)	$σ^{2} = 1$
II (SLR)	$τ = 0.5,$ $σ^{2} = 1$
III (ANCOVA)	$α = 2.0, ν = (4.0, 6.0, 8.0)$ , $τ = 0.5$ , $σ^{2} = 1$
IV (Group-Contracted ANCOVA)	$α = 0.0, \tilde{ν} = (3.0)$ , $τ = 0.5$ , $σ^{2} = 1$
V (Interaction ANCOVA)	$α = 0.5, ν = (1.0, 1.5, 2.0)$ , $ρ = (0.25, 0.5, 0.75, 1.0)$ , $τ = 0.5$ , $σ^{2} = 1$
VI (Group-Interaction ANCOVA)	$α = 0.0, \tilde{ν} = (0.8)$ , $\tilde{ρ} = (0, 1)$ , $τ = 1$ , $σ^{2} = 1$
VII (Heteroscedastic	$α = 0.0, \tilde{ν} = (3.0)$ , $τ = 0.5$ ,
Group-Contracted ANCOVA)	$σ_{1}^{2} = 1$ , $σ_{2}^{2} = 5$
VIII (Heteroscedastic	$α = 0.0, \tilde{ν} = (3.0)$ , $τ = 0.5$ , $\tilde{ρ} = (0, 1)$ ,
Group-Interaction ANCOVA)	$σ_{1}^{2} = 1$ , $σ_{2}^{2} = 5$

Table 2. Table 2: Settings for the four model classes in the two-way unreplicated layout simulation study with 10 × 5 10 5 10\times 5 layouts where α = 0 𝛼 0 \alpha=0 .

Class	Parameters
I (Additive Model)	$α = 1, 𝝂 \in {2, 3, 4, 5, 6, 7, 8, 9, 10}$ , $𝝉 \in {1, 2, 3, 4, 5}$ , $σ^{2} = 1$
II (Group-by-Column	$α = 1, 𝝂 \in {2, 3, 4, 5, 6, 7, 8, 9, 10}$ , $𝝉_{1} \in {1.0, 1.8, 2.6, 3.4, 4.2}$ ,
Interaction)	$𝝉_{2} \in {4.2, 3.4, 2.6, 1.8, 1.0}$ , $σ^{2} = 1$
III (Heteroscedastic	$α = 1, 𝝂 \in {2, 3, 4, 5, 6, 7, 8, 9, 10}$ , $𝝉 \in {1, 2, 3, 4, 5}$ ,
Additive)	$σ_{1}^{2} = 1.0$ , $σ_{2}^{2} = 0.10$
IV (Heteroscedastic Group-	$α = 1, 𝝂 \in {2, 3, 4, 5, 6, 7, 8, 9, 10}$ , $𝝉_{1} \in {1.0, 1.8, 2.6, 3.4, 4.2}$ ,
by-Column Interaction)	$𝝉_{2} \in {4.2, 3.4, 2.6, 1.8, 1.0}$ , $σ_{1}^{2} = 1.0$ , $σ_{2}^{2} = 0.10$

Table 3. Table 3: Settings for the four model classes in the 10 × 5 10 5 10\times 5 two-way layout simulation study with smaller effect size.

Class	Parameters
I (Additive Model)	$𝝂 \in {1, 2, 3, 4, 5, 7, 8, 9, 10}$ , $𝝉 \in {1, 2, 3, 4, 5}$ , $σ^{2} = 1$
II (Group-by-Column	$𝝂 \in {1, 2, 3, 4, 5, 7, 8, 9, 10}$ , $𝝉_{1} \in {1.0, 1.5, 2.0, 2.5, 3.0}$ ,
Interaction)	$𝝉_{2} \in {3.0, 2.5, 2.0, 1.5, 1.0}$ , $σ^{2} = 1$
III (Heteroscedastic	$𝝂 \in {1, 2, 3, 4, 5, 7, 8, 9, 10}$ , $𝝉 \in {1, 2, 3, 4, 5}$ ,
Additive)	$σ_{1}^{2} = 1.0$ , $σ_{2}^{2} = 0.25$
IV (Heteroscedastic Group-	$𝝂 \in {1, 2, 3, 4, 5, 7, 8, 9, 10}$ , $𝝉_{1} \in {1.0, 1.5, 2.0, 2.5, 3.0}$ ,
by-Column Interaction)	$𝝉_{2} \in {3.0, 2.5, 2.0, 1.5, 1.0}$ , $σ_{1}^{2} = 1.0$ , $σ_{2}^{2} = 0.25$

Table 4. Table 4: Settings for the four model classes in the two-way layout simulation study with 5 × 5 5 5 5\times 5 layouts with larger effect size.

Class	Parameters
I (Additive Model)	$𝝂 \in {1, 1.5, 2, 2.5, 3}$ , $𝝉 \in {1, 2, 3, 4, 5}$ , $σ^{2} = 1$
II (Group-by-Column	$𝝂 \in {1, 2, 3, 4, 5}$ , $𝝉_{1} \in {1.0, 1.8, 2.6, 3.4, 4.2}$ ,
Interaction)	$𝝉_{2} \in {4.2, 3.4, 2.6, 1.8, 1.0}$ , $σ^{2} = 1$
III (Heteroscedastic	$𝝂 \in {1, 2, 3, 4, 5}$ , $𝝉 \in {1, 2, 3, 4, 5}$ ,
Additive)	$σ_{1}^{2} = 1.0$ , $σ_{2}^{2} = 0.25$
IV (Heteroscedastic Group-	$𝝂 \in {1, 2, 3, 4, 5}$ , $𝝉_{1} \in {1.0, 1.8, 2.6, 3.4, 4.2}$ ,
by-Column Interaction)	$𝝉_{2} \in {4.2, 3.4, 2.6, 1.8, 1.0}$ , $σ_{1}^{2} = 1.0$ , $σ_{2}^{2} = 0.25$

Table 5. Table 5: Settings for the four model classes in the two-way layout simulation study with 5 × 5 5 5 5\times 5 layouts with smaller effect size.

Class	Parameters
I (Additive Model)	$𝝂 \in {1, 2, 3, 4, 5}$ , $𝝉 \in {1, 2, 3, 4, 5}$ , $σ^{2} = 1$
II (Group-by-Column	$𝝂 \in {1, 2, 3, 4, 5}$ , $𝝉_{1} \in {1.0, 1.5, 2.0, 2.5, 3.0}$ ,
Interaction)	$𝝉_{2} \in {3.0, 2.5, 2.0, 1.5, 1.0}$ , $σ^{2} = 1$
III (Heteroscedastic	$𝝂 \in {1, 2, 3, 4, 5}$ , $𝝉 \in {1, 2, 3, 4, 5}$ ,
Additive)	$σ_{1}^{2} = 1.0$ , $σ_{2}^{2} = 0.25$
IV (Heteroscedastic Group-	$𝝂 \in {1, 2, 3, 4, 5}$ , $𝝉_{1} \in {1.0, 1.5, 2.0, 2.5, 3.0}$ ,
by-Column Interaction)	$𝝉_{2} \in {3.0, 2.5, 2.0, 1.5, 1.0}$ , $σ_{1}^{2} = 1.0$ , $σ_{2}^{2} = 0.25$

Equations123

Y = X β + ε

Y = X β + ε

Y = 1^{T} α + W ν + V τ + U ρ + ε .

Y = 1^{T} α + W ν + V τ + U ρ + ε .

Y ∣ m_{s}^{c}, θ \sim N (X β, Φ^{- 1})

Y ∣ m_{s}^{c}, θ \sim N (X β, Φ^{- 1})

P (Y ∣ θ, m_{s}^{c}) = (2 π)^{- \frac{N}{2}} ∣Φ ∣^{\frac{1}{2}} \cdot exp {- \frac{1}{2} (Y - X β)^{T} Φ (Y - X β)}

P (Y ∣ θ, m_{s}^{c}) = (2 π)^{- \frac{N}{2}} ∣Φ ∣^{\frac{1}{2}} \cdot exp {- \frac{1}{2} (Y - X β)^{T} Φ (Y - X β)}

P (β, φ, ∣ m_{s}^{c}) \propto φ^{- 1} (homoscedastic models), or

P (β, φ, ∣ m_{s}^{c}) \propto φ^{- 1} (homoscedastic models), or

P (β, φ, ∣ m_{s}^{c}) \propto φ_{1}^{- 1} \cdot φ_{2}^{- 1} (heteroscedastic models)

P (β, φ, ∣ m_{s}^{c}) \propto φ_{1}^{- 1} \cdot φ_{2}^{- 1} (heteroscedastic models)

P (α, φ) \propto φ^{- 1} (homoscedastic models),

P (α, φ) \propto φ^{- 1} (homoscedastic models),

P (α, φ) \propto φ_{1}^{- 1} \cdot φ_{2}^{- 1} (heteroscedastic models),

P (α, φ) \propto φ_{1}^{- 1} \cdot φ_{2}^{- 1} (heteroscedastic models),

β_{- α} ∣ φ, m_{s}^{c} \sim N (0, g (X^{T} Φ^{- 1} X)^{- 1}), and

β_{- α} ∣ φ, m_{s}^{c} \sim N (0, g (X^{T} Φ^{- 1} X)^{- 1}), and

g \sim IG (\frac{1}{2}, \frac{N}{2})

g \sim IG (\frac{1}{2}, \frac{N}{2})

P (m_{s^{'}}^{c^{'}} ∣ Y) = \frac{P ( Y ∣ m _{s^{'}}^{c^{'}} ) P ( m _{s^{'}}^{c^{'}} )}{c = I \sum C s = 1 \sum S P ( Y ∣ m _{s}^{c} ) P ( m _{s}^{c} )}

P (m_{s^{'}}^{c^{'}} ∣ Y) = \frac{P ( Y ∣ m _{s^{'}}^{c^{'}} ) P ( m _{s^{'}}^{c^{'}} )}{c = I \sum C s = 1 \sum S P ( Y ∣ m _{s}^{c} ) P ( m _{s}^{c} )}

B^{12} = \frac{a \cdot \int P ( Y ∣ β , Φ , m _{1} ) P ( β ) P ( φ ) d β d φ \cdot P ( m _{2} )}{a ^{'} \cdot \int P ( Y ∣ β , Φ , m _{2} ) P ( β ) P ( φ ) d β d φ \cdot P ( m _{1} )}

B^{12} = \frac{a \cdot \int P ( Y ∣ β , Φ , m _{1} ) P ( β ) P ( φ ) d β d φ \cdot P ( m _{2} )}{a ^{'} \cdot \int P ( Y ∣ β , Φ , m _{2} ) P ( β ) P ( φ ) d β d φ \cdot P ( m _{1} )}

Y = 1^{T} α + W ν + V τ + U ρ + ε

Y = 1^{T} α + W ν + V τ + U ρ + ε

P (Y ∣ m_{s}^{c}, α, ν, τ, ρ, φ) = (2 π)^{- \frac{N}{2}} ∣Φ ∣^{\frac{1}{2}} \cdot exp {- \frac{1}{2} (Y - 1^{T} α - W ν - V τ - U ρ)^{T} Φ (Y - 1^{T} α - W ν - V τ - U ρ)}

P (Y ∣ m_{s}^{c}, α, ν, τ, ρ, φ) = (2 π)^{- \frac{N}{2}} ∣Φ ∣^{\frac{1}{2}} \cdot exp {- \frac{1}{2} (Y - 1^{T} α - W ν - V τ - U ρ)^{T} Φ (Y - 1^{T} α - W ν - V τ - U ρ)}

P (Y ∣ m_{s}^{c}) = \iint P (Y ∣ β, φ, m_{s}^{c}) P (β) P (φ) d β d φ

P (Y ∣ m_{s}^{c}) = \iint P (Y ∣ β, φ, m_{s}^{c}) P (β) P (φ) d β d φ

Y = 1^{T} α + W ν + V τ + U \tilde{ρ} + ε

Y = 1^{T} α + W ν + V τ + U \tilde{ρ} + ε

P (Y ∣ m_{s}^{c}, α, ν, τ, \tilde{ρ}, φ) = (2 π)^{- \frac{N}{2}} ∣Φ ∣^{\frac{1}{2}} exp {- \frac{1}{2} (Y - 1 α - W ν - V τ - U \tilde{ρ})^{T} Φ (Y - 1 α - W ν - V τ - U \tilde{ρ})}

P (Y ∣ m_{s}^{c}, α, ν, τ, \tilde{ρ}, φ) = (2 π)^{- \frac{N}{2}} ∣Φ ∣^{\frac{1}{2}} exp {- \frac{1}{2} (Y - 1 α - W ν - V τ - U \tilde{ρ})^{T} Φ (Y - 1 α - W ν - V τ - U \tilde{ρ})}

P (α) \propto 1,

P (α) \propto 1,

P (β ∣ φ, g) = N (0, g (X^{T} Φ X)^{- 1}),

P (β ∣ φ, g) = N (0, g (X^{T} Φ X)^{- 1}),

P (φ) \propto φ^{- 1} (for models with homoscedasticity),

P (φ) \propto φ^{- 1} (for models with homoscedasticity),

P (φ) \propto φ_{1}^{- 1} φ_{2}^{- 1} (for models with heteroscedasticity), and

P (φ) \propto φ_{1}^{- 1} φ_{2}^{- 1} (for models with heteroscedasticity), and

P (g) = IG (\frac{1}{2}, \frac{N}{2})

P (g) = IG (\frac{1}{2}, \frac{N}{2})

P (Y ∣ m_{s}^{c}) = \iiiint P (Y ∣ α, β, φ, m_{s}^{c}) P (α) P (β ∣ φ, g) P (φ) P (g) d α d β d φ d g

P (Y ∣ m_{s}^{c}) = \iiiint P (Y ∣ α, β, φ, m_{s}^{c}) P (α) P (β ∣ φ, g) P (φ) P (g) d α d β d φ d g

P (Y ∣ m)

P (Y ∣ m)

= 0 \int \infty - \infty \int \infty (2 π)^{^{- \frac{N b}{2}}} ∣Σ ∣^{- \frac{b}{2}} exp {- b \cdot \frac{1}{2} [Y^{T} Σ^{- 1} Y - 2 β^{T} X^{T} Σ^{- 1} Y + β^{T} X^{T} Σ^{- 1} X β]} Σ^{- 1} d β d γ

= 0 \int \infty - \infty \int \infty (2 π)^{^{- \frac{N b}{2}}} γ^{\frac{N b}{2} - 1} exp {- b \cdot \frac{1}{2} [Y^{T} Σ^{- 1} Y - 2 β^{T} X^{T} Σ^{- 1} Y + β^{T} X^{T} Σ^{- 1} X β]} d β d γ

= 0 \int \infty - \infty \int \infty (2 π)^{^{- \frac{N b}{2}}} γ^{\frac{N b}{2} - 1} exp {- b \cdot \frac{γ}{2} [β^{T} X^{T} X β - 2 β^{T} X^{T} Y + Y^{T} H Y]} \times

exp {- b \cdot \frac{γ}{2} [Y^{T} Y - Y^{T} H Y]} d β d γ

= 0 \int \infty (2 π)^{^{- \frac{N b}{2}}} γ^{\frac{N b}{2} - 1} (2 π)^{+ \frac{p}{2}} ∣ b^{- 1} γ^{- 1} (X^{T} X)^{- 1} ∣^{+ \frac{1}{2}} exp {- b \cdot \frac{γ}{2} [Y^{T} (I - H) Y]} d γ

= (2 π)^{- \frac{N b - P}{2}} b^{- \frac{P}{2}} ∣ X^{T} X ∣^{- \frac{1}{2}} Γ (\frac{N b - P}{2}) (\frac{b \cdot SSResid ^{I}}{2})^{- \frac{N b - P}{2}}

= π^{- \frac{N b - P}{2}} b^{- \frac{N b}{2}} ∣ X^{T} X ∣^{- \frac{1}{2}} Γ (\frac{N b - P}{2}) (SSResid^{I})^{- \frac{N b - P}{2}} . □

q^{b} (Y ∣ m)

q^{b} (Y ∣ m)

= \frac{π ^{- \frac{N - P}{2}} ∣ X ^{T} X ∣ ^{- \frac{1}{2}} Γ ( \frac{N - P}{2} ) ( SSResid ^{I} ) ^{- \frac{N - P}{2}} .}{π ^{- \frac{N b - P}{2}} b ^{- \frac{N b}{2}} ∣ X ^{T} X ∣ ^{- \frac{1}{2}} Γ ( \frac{N b - P}{2} ) ( SSResid ^{I} ) ^{- \frac{N b - P}{2}}}

= π^{- \frac{N ( 1 - b )}{2}} b^{\frac{N b}{2}} (SSResid^{I})^{- \frac{N ( 1 - b )}{2}} \frac{Γ ( \frac{N - P}{2} )}{Γ ( \frac{N b - P}{2} )} . □

P (Y ∣ m_{s})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Detection of latent heteroscedasticity and group-based regression effects in linear models via Bayesian model selection

Thomas A. Metzger

Department of Statistics, Virginia Tech

and

Christopher T. Franck

Department of Statistics, Virginia Tech

Abstract

Standard linear modeling approaches make potentially simplistic assumptions regarding the structure of categorical effects that may obfuscate more complex relationships governing data. For example, recent work focused on the two-way unreplicated layout has shown that hidden groupings among the levels of one categorical predictor frequently interact with the ungrouped factor. We extend the notion of a “latent grouping factor” to linear models in general. The proposed work allows researchers to determine whether an apparent grouping of the levels of a categorical predictor reveals a plausible hidden structure given the observed data. Specifically, we offer Bayesian model selection-based approaches to reveal latent group-based heteroscedasticity, regression effects, and/or interactions. Failure to account for such structures can produce misleading conclusions. Since the presence of latent group structures is frequently unknown a priori to the researcher, we use fractional Bayes factor methods and mixture $g$ -priors to overcome lack of prior information.

Keywords: fractional Bayes factor, mixture g-priors, model selection, hidden additivity, latent grouping

1 Introduction

Linear models with categorical predictors are among the most frequently used statistical models, but oversimplification of the variance or regression effect structures can misrepresent key relationships within observed data. Figure 1 shows three relevant data sets. First, the left panel shows a simulated one-way ANOVA experiment, based on the statistics reported in Welch, (1951). The horizontal axis represents the levels of a treatment factor, and the vertical axis represents a continuous response. The center panel is an analysis of covariance (ANCOVA) that analyzes the breaking strength of a starch chip (Flurry,, 1939). The horizontal axis represents the chip’s thickness in $10^{-4}$ inches, the vertical axis represents the breaking strength in grams, and the point shapes represent the plant from which the starch was derived. Finally, the right panel is an unreplicated two-way layout, measuring the genomic hybridization signal in dogs with lymphoma (Franck et al.,, 2013). The horizontal axis represents whether the sample was taken from normal or tumor tissue, the vertical axis represents the intensity of the genomic hybridization signal, and the individual lines represent six dogs studied. In each case, the levels of the categorical predictor appear to fall into one of two groups, although the group structure is unknown before data collection. The apparent group structure for the data in Figure 1 is represented by dark and light gray.

Situations similar to those illustrated in Figure 1 often arise in research. Perhaps after plotting the data or in reviewing previous related work, the researcher begins to suspect that there is a hidden grouping within the levels of the categorical predictor. We call this predictor the suspected latent grouping factor, or SLGF. This work aims to determine whether the latent groupings are plausible and how the group structure affects the response.

We must consider the SLGF with two key ideas in mind: first, that the SLGF might manifest itself through one of several structures in the data, namely, (i) group-specific regression effects, (ii) group-specific variances, and/or (iii) hidden interactions between groups and other model predictors. There are eight possible combinations of structures. Second, the specific assignment of levels to groups is unknown. Both of these aspects of the SLGF must be learned from the data in an unsupervised fashion: specifically, our proposed method uses Bayesian model selection to assess the plausibility of the SLGF’s impact on the data based on posterior probability. Formal detail can be found in Section 2.

Common linear model assumptions include homoscedasticity and a unique effect of the response at each level of categorical predictors; for a thorough review see Berry, (1993). Violations of these assumptions can have a wide variety of negative consequences on inference; see Rencher and Schaalje, (2008), Deschamps, (1991) and Scheffé, (1959). Model misspecification can lead to additional problems; see Rencher and Schaalje, (2008), Rao, (1971), and Deegan, (1976).

Regarding the detection of heteroscedasticity, many analyses follow a two-stage approach. These include the methods proposed by Bartlett, (1937), Levene, (1960), Brown and Forsythe, (1974), and Hartley, (1950). When heteroscedasticity is believed to be a function of a continuous predictor, many methods are available, including Breusch and Pagan, (1979), Cook and Weisberg, (1983), White, (1980), Glejser, (1969), Park, (1966), Box and Hill, (1974), Bickel, (1978), Jobson and Fuller, (1980), and Carroll and Ruppert, (1982).

Regarding inference in the presence of heteroscedasticity, methods include those of Box and Cox, (1964), Carroll and Ruppert, (1988), Perthes, (1855); Morrison, (1983), Huber, (1967), Eicker, (1967), Cragg, (1983), Hildreth and Houck, (1968), Long and Ervin, (2000), Polasek et al., (1998), Polasek and Pötzelberger, (1994), Cuervo and Achcar, (2009), Boscardin and Gelman, (1994), MacKinnon and White, (1985), Dumitrascu et al., (2015), and White, (1980); for a review see Hayes and Cai, (2007) and Cribari-Neto and Zarkos, (1999). Many methods exist to detect heteroscedasticity or conduct inference in its presence. To our knowledge, ours is the first proposal of latent group-based heteroscedasticity alongside possibly unique regression effects and/or hidden interactions.

While an extension to more than two groups is natural, our choice of two groups is still a reasonable approach in many problems; Kharrati-Kopaei and Sadooghi-Alvandi, (2007) and Franck, (2018) study factor level groupings based on two groups in unreplicated two-way layouts, while Goldfeld and Quandt, (1965) model heteroscedasticity as a function of two groups, where groups are created by partitioning observations ordinally.

The main contribution of this work is to propose a method of probabilistically detecting the presence of hidden categorical level groupings, and to describe the model specifications that capture the effect of such groupings, through the use of Bayesian model selection. This work generalizes the notion of latent group-based effects from the two-way layout to linear models; see Kharrati-Kopaei and Sadooghi-Alvandi, (2007); Franck et al., (2013); Franck and Osborne, (2016), and Franck, (2018). There is no unified Bayesian model selection approach to account for these structures in the context of latent groupings of the levels of a categorical predictor in general. Although we illustrate our method in the contexts of the three specific settings shown in Figure 1, our proposal is flexible enough to be used in the context of any linear model with a categorical predictor.

The remaining structure of this paper is as follows. Section 2 describes the candidate models in the context of the SLGF, as well as the fractional Bayes factor and Bayesian model selection details. Section 3 describes our proposed Bayesian model specification in contexts of ANCOVA models and unreplicated two-way layouts. Section 4 describes a simulation study to assess the performance of our method. Section 5 applies our proposed method to the empirical data sets of Figure 1. Section 6 summarizes the proposed method and provides some additional comments. Additional mathematical details and simulation results for one-way ANOVA data are provided in the supplement.

2 Proposed Method

2.1 Specification of Linear Models with Categorical Predictors

We begin the development of our approach by elucidating the assignment of the levels of the SLGF into groups. As an illustration, consider the data analyzed by Franck et al., (2013), shown in the right panel of Figure 1: an unreplicated two-way layout with 6 rows and 2 columns. We choose the dogs as the SLGF. Three examples of possible SLGF level assignments are shown in Figure 2:

For a SLGF with levels $k=1,\ldots,K+1$ , let $\bm{k}=(1,\dots,K+1)^{T}$ . We formally define a grouping scheme, denoted ${d_{s}}(\bm{k})$ , as a partitioning that assigns the data into two groups based on each observation’s corresponding level of the SLGF. Let $\mathcal{S}$ be the set of all possible schemes, and $s=1,\dots,S$ index the possible schemes. We refer to the example scheme 3, in the rightmost panel of Figure 2, to motivate the subsequent notational definitions; this example partitions levels $k=\$ 1, 4, and 5 separately from 2, 3, and 6. For a scheme ${d_{s}}(\bm{k})$ , a colon separates the levels each group comprises; for example, the scheme of example 3 is denoted as ${d_{s}}(\bm{k})=1,4,5:2,3,6$ . The most effective method to partition the levels of the SLGF into groups depends on the size and nature of the study in question. Many problems have $K$ small enough that a combinatoric search over all possible grouping schemes is reasonable; see Franck et al., (2013), Franck and Osborne, (2016), Franck, (2018), and Kharrati-Kopaei and Sadooghi-Alvandi, (2007). We use the combinatoric search approach exclusively in this study.

Next we formalize the idea of model structures in the context of specific linear models. Recall from Section 1 that we must potentially accommodate eight model structures containing a mix of group-based regression effects, group-based variances, and group-based interactions. These structures must be tailored to both the researcher’s suspicion and goals, as well as the data layout under consideration. Thus we propose the model class, which is the set of models representing a particular structure. While structures reflect the presence or absence of group-based effects, classes prescribe specific corresponding models in the context of the data layout in question.

Now that the grouping schemes and model classes have been enumerated, consider a linear model with $N$ centered observations $\boldsymbol{Y}$ :

[TABLE]

with model matrix $X$ parametrized to be full column rank, regression effects $\boldsymbol{\beta}$ , and errors $\boldsymbol{\varepsilon}\overset{\text{iid}}{\sim}N(\boldsymbol{0},\,\Sigma)$ with covariance matrix $\Sigma$ . To account for the eight possible model structures previously described, we will partition $\bm{Y},\,\boldsymbol{\beta},\,\boldsymbol{\varepsilon},$ and $\Sigma$ . Partition $\boldsymbol{\beta}$ into four components: let $\alpha$ represent an intercept common to all models, let $\boldsymbol{\nu}=\{\nu_{k}\}_{k=1}^{K}$ represent the SLGF with $K+1$ levels, let $\boldsymbol{\tau}=\{\tau_{j}\}_{j=1}^{J}$ contain $J$ other regression effects, categorical or continuous, and, let $\boldsymbol{\rho}=\{\rho_{\ell}\}_{\ell=1}^{L}$ represent $L$ interactions with the SLGF. Then $\boldsymbol{\beta}_{(1+K+J+L)\times 1}=(\alpha,\ \boldsymbol{\nu}_{K\times 1},\ \boldsymbol{\tau}_{J\times 1},\,\boldsymbol{\rho}_{L\times 1})$ . Similarly, partition the model matrix $X_{N\times(1+K+J+L)}=(\bm{1}^{T}_{N\times 1}\,|W_{N\times K}\,|\,V_{N\times J}\,|\,U_{N\times L})$ into three matrices corresponding to the data related to $\alpha,\,\boldsymbol{\nu}$ , $\boldsymbol{\tau}$ , and $\boldsymbol{\rho}$ , respectively. Thus we can express (1) equivalently as

[TABLE]

In cases where the effect structure for one of the terms in Equation (2) depends on a latent grouping scheme, denote that term with a tilde. For example, in structures with group-based interactions, model a group-based interaction $\tilde{\boldsymbol{\rho}}$ instead of usual the interaction $\boldsymbol{\rho}$ . Similarly, heteroscedastic structures with group-based variances are modeled with error vector $\tilde{\boldsymbol{\varepsilon}}$ and corresponding covariance matrix $\tilde{\Sigma}$ instead of the homoscedastic counterparts $\boldsymbol{\varepsilon}$ and $\Sigma$ .

When a model contains a group structure, we arrange the observations within $\bm{Y}$ to first contain the $n_{1}$ observations corresponding to $s_{1}$ , followed by the $n_{2}$ observations corresponding to $s_{2}$ , denoted $\bm{Y}=(\bm{Y}_{1},\bm{Y}_{2})$ . We similarly arrange the rows and columns of $X$ to contain the corresponding observations, which helps concisely express our proposed forms of heteroscedasticity in $\Sigma$ . For structures with distinct regression effects by group, arrange $\tilde{\boldsymbol{\nu}}=(\tilde{\boldsymbol{\nu}}_{1},\,\tilde{\boldsymbol{\nu}}_{2})$ , and for classes with distinct variances by group, partition the error vector and covariance matrix such that $\boldsymbol{\tilde{\varepsilon}}\sim N(\bm{0},\tilde{\Sigma})$ where $\tilde{\Sigma}=\left(\begin{array}[]{c|c}\sigma^{2}_{1}I_{n_{1}\times n_{1}}&0_{n_{1}\times n_{2}}\\ \hline\cr 0_{n_{2}\times n_{1}}&\sigma^{2}_{2}I_{n_{2}\times n_{2}}\end{array}\right)_{N\times N}$ . Notice $N=n_{1}+n_{2}$ , for the effects corresponding to groups $s_{1}$ and $s_{2}$ , respectively.

With multiple schemes, classes, and structures under consideration, we next propose a Bayesian model selection approach to assess whether latent groups exist within the data, and if so, identify the appropriate grouping scheme ${d_{s}}(\bm{k})\in\mathcal{S}$ (if present) and class $c\in\mathcal{C}$ .

2.2 Bayesian Model Selection Details

2.2.1 Model Specification

Denote the set of all candidate models $\mathcal{M}=\{m_{s}^{c}\}$ , indexed over all possible schemes $s=1,...,{S}$ and classes $c=1,...,{C}$ , where $|\mathcal{M}|=M$ ; to ease the notational burden, we have denoted ${d_{s}}(\bm{k})$ as $s$ in the subscript of $m$ . Although each model matrix $X$ and estimators for $\boldsymbol{\beta}$ and $\Sigma$ depend on the grouping and class under consideration, for notational simplicity we do not index $X$ , $\boldsymbol{\beta}$ , or $\Sigma$ by $m$ , $s$ , or $c$ . Let the vector $\boldsymbol{\varphi}$ contain the single precision $\varphi:=\frac{1}{\sigma^{2}}$ under homoscedastic models, and the corresponding subgroup precisions $\varphi_{1}$ and $\varphi_{2}$ under heteroscedastic models. Denote the precision matrix $\Phi=\Sigma^{-1}$ and $\bm{\theta}=\{\boldsymbol{\varphi},\boldsymbol{\beta}\}$ , the set of unknown model parameters; then we can express (1) conditionally as

[TABLE]

for a given scheme ${d_{s}}(\bm{k})\in\mathcal{S}$ and class $c\in\mathcal{C}$ . Thus the likelihood function is given by

[TABLE]

We consider two common prior specifications on the regression effects and precision(s). In both cases, we prefer noninformative priors on the precision(s) because prior information on precision is rarely available. First, we consider a noninformative approach, where we have

[TABLE]

Next we consider the Zellner-Siow mixture $g$ -prior (Zellner and Siow,, 1980; Zellner,, 1986; Liang et al.,, 2008), where

[TABLE]

where $\boldsymbol{\beta}_{-\alpha}:=\boldsymbol{\beta}\setminus\{\alpha\}$ . We use the Zellner-Siow mixture $g$ -prior in relatively data-poor situations, such as unreplicated two-way layouts, to reduce the dimensionality of the parameter space with improper priors. This consequently lowers the minimal training sample size, a critical component of the fractional Bayes factor approach described in Section 2.2.3. We use the noninformative flat prior where data are more abundant; see Section 3.2 for more detail.

2.2.2 Model Priors for Classes and Schemes

We impose a uniform model prior by model class: $P(m^{c})=\sum_{s=1}^{S}P(m_{s}^{c}):=\frac{1}{C}$ . Depending on the data layout and model structures under consideration, various classes may contain different numbers of models; thus we subsequently divide each class’s prior uniformly among the models it contains. For example, in an ANOVA layout, one model class might represent the single mean model, where $P(m^{c})=\frac{1}{C}$ ; note we do not index this model by $s$ as there is no grouping structure present in this class. Alternatively, for a model class containing models with distinct regression effects by grouping scheme, the $S$ individual models within the class would be given the prior $P(m_{s}^{c})=\frac{1}{S\cdot C}$ . By implementing Bayes’ Theorem, posterior model probabilities approximated via the fractional Bayes factor (see Section 2.2.3) can then be easily computed for each individual model and class with the marginal probabilities of each model. For a given model $m_{c^{\prime}}^{s^{\prime}}$ we compute:

[TABLE]

We can then aggregate the overall probability of a given class $c^{\prime}$ as $P(m^{c^{\prime}}|\bm{Y})=\underset{s\in\mathcal{S}}{\sum}P(m_{s}^{c^{\prime}}|\bm{Y})$ , or for a given grouping scheme $s^{\prime}$ as $P(m_{s^{\prime}}|\bm{Y})=\underset{c\in\mathcal{C}}{\sum}P(m_{s^{\prime}}^{c}|\bm{Y})$ ; hence the proposed method allows researchers to draw probabilistic conclusions about both structures and specific grouping schemes.

2.2.3 Fractional Bayes Factor Approach

We next motivate the need to use a fractional Bayes factor approach. Consider a comparison between arbitrary models $m_{1}$ and $m_{2}$ , where $m_{1}$ and $m_{2}$ represent homoscedastic and heteroscedastic models, respectively, via $B^{12}=\frac{P(m_{1}|\boldsymbol{Y})P(m_{2})}{P(m_{2}|\boldsymbol{Y})P(m_{1})}$ . For $m_{1}$ we use $P(\boldsymbol{\varphi})=a\cdot\varphi^{-1}$ , and for $m_{2}$ we use $P(\boldsymbol{\varphi})=a^{\prime}\cdot\varphi_{1}^{-1}\cdot\varphi_{2}^{-1}$ for arbitrary constants $a\neq a^{\prime}$ . So the Bayes factor to compare $m_{1}$ and $m_{2}$ is

[TABLE]

which is defined only up to the arbitrary constant $\frac{a}{a^{\prime}}$ and thus inappropriate for use in model comparison. Note this problem arises from the use of noninformative priors on the precisions when comparing homoscedastic and heteroscedastic models. Fractional Bayes factors were developed to elicit a cancellation of this constant, rendering a Bayes factor that is well-defined (O’Hagan,, 1995).

For a description of fractional Bayes factors (FBFs), see O’Hagan, (1995) and O’Hagan, (1997). We use the FBF approach of O’Hagan rather than the intrinsic Bayes factor of Berger and Pericchi, (1996), as the need to choose a training sample would be complicated by the potential scarcity of data induced by some clustering schemes.

To fully quantify the fractional marginal probability of a given model $m_{i}$ , we must compute both $\int P(\bm{Y}|\boldsymbol{\theta}_{i},m_{i})\pi(\boldsymbol{\theta}_{i})d\boldsymbol{\theta}_{i}$ and $\int P^{b}(\bm{Y}|\boldsymbol{\theta}_{i},m_{i})\pi(\boldsymbol{\theta}_{i})d\boldsymbol{\theta}_{i}$ for some user-chosen fractional exponent $b$ . O’Hagan, (1995) provides several recommendations for $b$ , including $b:=\frac{m_{0}}{N}$ , where $m_{0}$ is the minimal training sample size necessary for $P^{b}(\bm{Y}|m_{i})$ to be proper. In this study we choose $b=\frac{m_{0}}{N}$ , which elicits consistent model selection (O’Hagan,, 1995). In many cases, tractable expressions exist for both $\int P(\bm{Y}|\boldsymbol{\theta}_{i},m_{i})\pi(\boldsymbol{\theta})d\boldsymbol{\theta}$ and $\int P^{b}(\bm{Y}|\boldsymbol{\theta}_{i},m_{i})\pi(\boldsymbol{\theta})d\boldsymbol{\theta}$ . When these integrals are intractable, we use the Laplace approximation in the computation of both $\int P(\bm{Y}|\boldsymbol{\theta}_{i},m_{i})\pi(\boldsymbol{\theta})d\boldsymbol{\theta}$ and $\int P^{b}(\bm{Y}|\boldsymbol{\theta}_{i},m_{i})\pi(\boldsymbol{\theta})d\boldsymbol{\theta}$ , as cubature-based approximation is computationally expensive and our study of the integrand surface indicates a suitable shape. See online Supplement for justification and further detail.

3 Applications

3.1 Application 1: ANCOVA Models

We first consider an ANCOVA scenario with continuous effect $\boldsymbol{\tau}$ and where the SLGF with $K+1$ levels corresponds to the single categorical predictor effect $\boldsymbol{\nu}$ . We let $V_{N\times 1}$ contain the observed continuous covariate and $W_{N\times K}$ be the appropriate categorical effect design matrix. We consider models with and without the interaction effect $\boldsymbol{\rho}$ , which governs whether the linear trends share a common slope. Thus we begin with the model given by

[TABLE]

yielding the likelihood function

[TABLE]

To consider the model without an interaction, let $\boldsymbol{\rho}:=\bm{0}$ . For models with a group effect, let $\boldsymbol{\nu}:=\tilde{\boldsymbol{\nu}}$ ; for models with a group-by-continuous predictor interaction, let $\boldsymbol{\rho}:=\tilde{\boldsymbol{\rho}}$ ; and finally for heteroscedastic models, let $\boldsymbol{\varepsilon}:=\tilde{\boldsymbol{\varepsilon}}$ . As an illustration, we consider eight distinct model classes.

Class I ( $m^{\text{I}}$ ): the "null" model, with no categorical or continuous covariate effects and homoscedastic error variance, contains 1 model with no grouping schemes;

$\bm{Y}=\bm{1}^{T}\alpha+\boldsymbol{\varepsilon},\boldsymbol{\varepsilon}\sim N(\bm{0},\,\sigma^{2}I)$ 2. 2.

Class II ( $m^{\text{II}}$ ): the "simple linear regression (SLR)" model, with a continuous covariate effect only and homoscedastic error variance, contains 1 model with no grouping schemes;

$\bm{Y}=\bm{1}^{T}\alpha+V\boldsymbol{\tau}+\boldsymbol{\varepsilon},\boldsymbol{\varepsilon}\sim N(\bm{0},\,\sigma^{2}I)$ 3. 3.

Class III ( $m^{\text{III}}$ ): the "ANCOVA" model with categorical and continuous covariate effects and homoscedastic error variance, contains 1 model with no grouping schemes;

$\bm{Y}=\bm{1}^{T}\alpha+W\boldsymbol{\nu}+V\boldsymbol{\tau}+\boldsymbol{\varepsilon},\boldsymbol{\varepsilon}\sim N(\bm{0},\,\sigma^{2}I)$ 4. 4.

Class IV ( $m_{s}^{\text{IV}}$ ): the "group-contracted ANCOVA" model with group and continuous covariate effects and homoscedastic error variance, contains $2^{K-1}-1$ schemes;

$\bm{Y}=\bm{1}^{T}\alpha+W\tilde{\boldsymbol{\nu}}+V\boldsymbol{\tau}+\boldsymbol{\varepsilon},\boldsymbol{\varepsilon}\sim N(\bm{0},\,\sigma^{2}I)$ 5. 5.

Class V ( $m^{\text{V}}$ ): the "interaction ANCOVA" model with categorical and continuous covariate effects, level-based interaction, and homoscedastic error variance, contains 1 model with no grouping scheme; $\bm{Y}=\bm{1}^{T}\alpha+W\boldsymbol{\nu}+V\boldsymbol{\tau}+U\boldsymbol{\rho}+\boldsymbol{\varepsilon},\ \boldsymbol{\varepsilon}\sim N(\bm{0},\,\sigma^{2}I)$ 6. 6.

Class VI ( $m_{s}^{\text{VI}}$ ): the "group-interaction" model, with group and continuous covariate effects, group-based interaction, and homoscedastic error variance, contains $2^{K-1}-1$ schemes;

$\bm{Y}=\bm{1}^{T}\alpha+W\tilde{\boldsymbol{\nu}}+V\boldsymbol{\tau}+U\tilde{\boldsymbol{\rho}}+\boldsymbol{\varepsilon},\ \boldsymbol{\varepsilon}\sim N(\bm{0},\,\sigma^{2}I)$ 7. 7.

Class VII ( $m_{s}^{\text{VII}}$ ): the "heteroscedastic group-contracted" model (heteroscedastic Class IV), with group and continuous covariate effects and heteroscedastic error variance, contains $2^{K-1}-1$ schemes;

$\bm{Y}=\bm{1}^{T}\alpha+W\tilde{\boldsymbol{\nu}}+V\boldsymbol{\tau}+\boldsymbol{\tilde{\varepsilon}},\ \boldsymbol{\tilde{\varepsilon}}\sim N\left(\bm{0},\,\tilde{\Sigma}=\left[\begin{array}[]{c|c}\sigma_{1}^{2}I_{n_{1}\times n_{1}}&0_{n_{1}\times n_{2}}\\ \hline\cr 0_{n_{2}\times n_{1}}&\sigma_{2}^{2}I_{n_{2}\times n_{2}}\end{array}\right]_{N\times N}\right)$ 8. 8.

Class VIII ( $m_{s}^{\text{VIII}}$ ): the "heteroscedastic group-interaction" model (heteroscedastic Class VI), with group and continuous covariate effects, group-based interaction, and heteroscedastic error variance, contains $2^{K-1}-1$ schemes;

$\bm{Y}=\bm{1}^{T}\alpha+W\tilde{\boldsymbol{\nu}}+V\boldsymbol{\tau}+U\tilde{\boldsymbol{\rho}}+\tilde{\boldsymbol{\varepsilon}},\ \boldsymbol{\tilde{\varepsilon}}\sim N\left(\bm{0},\,\tilde{\Sigma}=\left[\begin{array}[]{c|c}\sigma_{1}^{2}I_{n_{1}\times n_{1}}&0_{n_{1}\times n_{2}}\\ \hline\cr 0_{n_{2}\times n_{1}}&\sigma_{2}^{2}I_{n_{2}\times n_{2}}\end{array}\right]_{N\times N}\right)$

Thus the structure of no group-based effects is represented by Classes I, II, III, and V; group-based regression effects are represented by Classes IV, VI, VII, and VIII; group-based variances are represented by Classes VII and VIII.

We use the FBF approach outlined in Section 2.2.3. Typically, the most complex model considered in an ANCOVA study will have a small minimal training sample size relative to the overall sample size. Thus we choose noninformative priors on the regression effects and precision(s) as described in Equations (5) and (6), so the marginal density of the data conditional on the model is

[TABLE]

This marginal density is analytically integrable over all of the homoscedastic classes defined previously. In heteroscedastic cases (classes VII and VIII), we use a Laplace approximation over the log-variances to approximate $P(\bm{Y}|m_{s}^{\text{VII}})$ and $P(\bm{Y}|m_{s}^{\text{VIII}})$ . We demonstrate the performance of this approach through a simulation study in Section 4.1 and through empirical data sets in Section 5.1.

3.2 Application 2: Unreplicated Two-Way Layouts

Next we consider an unreplicated two-way layout with $R$ rows, $C$ columns, and $N=R\times C$ observations. Because of the unreplicated nature of such a design, the full set of standard interaction effects cannot be incorporated due to insufficient degrees of freedom. Treat the row effects as the SLGF $\boldsymbol{\nu}$ , and let $\boldsymbol{\tau}$ contain the column effects; note by transposing the data table we could treat the column effects as the SLGF as well.

Our choice of model classes is based on the idea of hidden additivity, where interactions are treated as a group-by-column effect (Franck et al.,, 2013). The usual “additive" model $\bm{Y}=\bm{1}^{T}\alpha+W\boldsymbol{\nu}+V\boldsymbol{\tau}+\boldsymbol{\varepsilon}$ accounts for only row and column main effects. The group-based model, partitioned by levels of the row effect, does not include column effects but does consider group-by-column interactions, denoted by $\tilde{\boldsymbol{\rho}}$ . We require at least 2 levels of $k$ (rows) in both groups to ensure there are enough degrees of freedom to estimate this group-by-column interaction. We thus begin with the model

[TABLE]

where we let $\tilde{\boldsymbol{\rho}}:=\bm{0}$ in the additive model, $\boldsymbol{\tau}:=\bm{0}$ in cases with scheme-based regression effects, and $\boldsymbol{\varepsilon}:=\tilde{\boldsymbol{\varepsilon}}$ in cases with group-based heteroscedasticity. So the full likelihood function is given by

[TABLE]

In this layout we consider four model classes:

Class I ( $m^{\text{I}})$ : the "additive" model, where column effects are equivalent across rows and error variance is constant across all observations, contains 1 model with no grouping schemes;

$\bm{Y}=\bm{1}^{T}\alpha+W\boldsymbol{\nu}+V\boldsymbol{\tau}+\boldsymbol{\varepsilon},\ \boldsymbol{\varepsilon}\sim N(\bm{0},\,\sigma^{2}I)$ 2. 2.

Class II ( $m_{s}^{\text{II}})$ : the "group-by-column interaction" model, where levels are divided into two latent groups based on grouping scheme ${d_{s}}(\bm{k})$ with distinct means and equivalent variance, contains $2^{K-1}-K-1$ grouping schemes (Franck et al.,, 2013);

$\bm{Y}=\bm{1}^{T}\alpha+W\boldsymbol{\nu}+U\tilde{\boldsymbol{\rho}}+\boldsymbol{\varepsilon},\ \boldsymbol{\varepsilon}\sim N(\bm{0},\,\sigma^{2}I)$ 3. 3.

Class III ( $m_{s}^{\text{III}})$ : the "heteroscedastic additive" model, where levels are divided into two latent groups based on grouping scheme ${d_{s}}(\bm{k})$ with equivalent means and group-based variances, contains $2^{K-1}-K-1$ grouping schemes;

$\bm{Y}=\bm{1}^{T}\alpha+W\boldsymbol{\nu}+V\boldsymbol{\tau}+\tilde{\boldsymbol{\varepsilon}},\ \tilde{\boldsymbol{\varepsilon}}\sim N\left(\bm{0},\,\tilde{\Sigma}=\left[\begin{array}[]{c|c}\sigma_{1}^{2}I_{n_{1}\times n_{1}}&0_{n_{1}\times n_{2}}\\ \hline\cr 0_{n_{2}\times n_{1}}&\sigma_{2}^{2}I_{n_{2}\times n_{2}}\end{array}\right]_{N\times N}\right)$ 4. 4.

Class IV ( $m_{s}^{\text{IV}})$ : the "heteroscedastic group-by-column interaction" model where levels are divided into two latent groups based on grouping scheme ${d_{s}}(\bm{k})$ with distinct means and group-based variances, contains $2^{K-1}-K-1$ grouping schemes;

$\bm{Y}=\bm{1}^{T}\alpha+W\boldsymbol{\nu}+U\tilde{\boldsymbol{\rho}}+\tilde{\boldsymbol{\varepsilon}},\ \tilde{\boldsymbol{\varepsilon}}\sim N\left(\bm{0},\,\tilde{\Sigma}=\left[\begin{array}[]{c|c}\sigma_{1}^{2}I_{n_{1}\times n_{1}}&0_{n_{1}\times n_{2}}\\ \hline\cr 0_{n_{2}\times n_{1}}&\sigma_{2}^{2}I_{n_{2}\times n_{2}}\end{array}\right]_{N\times N}\right)$

Thus there are $M=(3\times 2^{K-1})-3K-2$ models considered.

3.2.1 Bayesian Model Specification: Unreplicated Two-Way Layouts

The two-way unreplicated layout is typically modeled with a high ratio of parameters to data points. Thus we must take care in choosing priors that allow us to successfully incorporate the FBF approach. With noninformative priors on the regression effects, the minimal training sample size needed to estimate $R-1$ regression effects, $2\cdot(C-1)$ group-by-column interactions, and $2$ error variances would be prohibitively large in relation to the sample size $N$ ; indeed, $b=\frac{m_{0}}{N}<\frac{1}{2}$ only when $R\geq 7$ . In this work, we propose using the Zellner-Siow mixture $g$ -prior on regression coefficients to reduce the dimensionality of the improper prior. Our use of this automatic prior on the regression effects lowers the minimal training sample size to $m_{0}=3$ , and thus the fractional exponent $b=\frac{m_{0}}{N}$ , to a value that allows us to successfully implement the FBF. Thus we let

[TABLE]

The marginal density of the data conditional on the model is

[TABLE]

It is well-known in the homoscedastic case that the Zellner-Siow mixture $g$ -prior advantageously elicits a Cauchy distribution on the regression effects (Liang et al.,, 2008). We show the Cauchy result also holds in the heteroscedastic case; that is, $\boldsymbol{\beta}\sim\text{MVCauchy}_{p}\left(\text{location}=0,\,\text{scale}=\left(\frac{X^{T}\Phi X}{n}\right)^{-1}\right)$ ; see online Supplement for proof.

In classes with homoscedasticity, this intergral is intractable over $g$ and thus a Laplace approximation is conducted over a single dimension; in heteroscedastic cases, a three-dimensional Laplace approximation is used to integrate $g$ , $\lambda_{1}=\log(\varphi_{1})$ , and $\lambda_{2}=\log(\varphi_{2}$ ).

4 Simulation Studies

4.1 Simulation Study: ANCOVA Models

In order to simulate ANCOVA data, we generated independent draws $x$ uniformly over the interval $(0,10)$ . Outcomes $\bm{Y}$ were then simulated according to each of the eight classes described in Section 3.1 with continuous, categorical, interaction, and/or group-based effects, as well as errors with group-based heteroscedasticity, as appropriate. In this study, we simulated $n_{k}=90$ observations at each level of the SLGF; another simulation study with $n_{k}=10$ is provided in the Supplement. We note that the settings in Table 1 are not calibrated to be equivalent in the total effect between classes, where such a calibration would be non-trivial. For example, the Class V model contains eight non-null parameters compared to the Class I model’s single parameter; thus we avoid comparing overall performance between classes in this study.

Figure 8 shows that overall our method accurately attributes posterior model probability to the correct model class for the eight classes described in Section 3.1. For Class I (null model) data, the posterior probability of the true class is high relative to the other classes, indicating that our method neither misattributes noise to a group-based structure, nor appears to systematically favor one of the other model classes when it does select the wrong model. Class II (SLR model) data performs similarly, but Classes IV (group-contracted ANCOVA) and VII (group-interaction) capture some posterior probability as well. The Class III (ANCOVA) data setting is favored by the correct class, with Class V (interaction ANCOVA) being the second choice. Class IV (group-contracted ANCOVA) data appears to be the most difficult to detect based on the parameters chosen for this study; although its posterior probabilities are generally higher than the other classes’, it is often mistaken for Class III data, meaning in this case a group-based categorical effect is often mistaken for the full categorical effect. Note that these misattributed classes differ by only two parameters in the regression structure. For Class V (interaction ANCOVA) data, the correct model class is favored. Class VI (group-interaction) data is also correctly favored the majority of the time; when misclassified, it is usually chosen as Class V (interaction ANCOVA) or VIII (heteroscedastic ANCOVA). Classes VII (heteroscedastic group-contracted) and VIII (heteroscedastic group-interaction) perform well; when wrong, Class VII tends to favor a spurious interaction effect.

These results indicate that, in general, our method tends to identify the correct class. When no group-based structure is present, our method tends to not erroneously fit spurious effects or variances. Thus group-based regression effects, interactions, and/or variances are detected with high accuracy when present. We see similar performance with the additional study provided in the online Supplement where $N=40$ .

4.2 Simulation Study: Unreplicated Two-Way Layouts

Row and column effects along with error variance(s) (provided in Table 2) were simulated to generate unreplicated two-way layouts. We consider layouts of size $10\times 5$ where $N=50$ . Another setting with a smaller effect size, and a study on layouts of size $5\times 5$ , are given in the Supplement.

Figure 4 shows that our method also tends to favor the true class in the two-way unreplicated layout, with high parameter to data ratio and the mixture $g$ -prior. For additive data from Class I, the true class is generally favored; the most probable alternative is typically the heteroscedastic additive Class III. Under large effect size, Class II (group-by-column interaction) data is correctly identified. Class III (heteroscedastic additive) and Class IV (heteroscedastic group-by-column interaction) data are both favored correctly as well regarding both effect sizes.

5 Case Studies

5.1 Case Study: ANCOVA

We revisit the Flurry, (1939) data, shown in the center panel of Figure 1 and in Figure 5. The breaking strength of a chip coated with a film derived from one of three plant materials is studied as a function of a continuous predictor (the thickness of the film) and a categorical predictor (the plant type from which the film was developed). The plot seems to indicate some degree of heteroscedasticity between canna and corn, versus potato; results are shown in Figure 5. Two models receive 93% of the posterior probability: the heteroscedastic group-contracted model, with $P(m_{1,2:3}^{\text{VII}}|\bm{Y})\approx 0.51$ , and the heteroscedastic group-interaction model, with $P(m_{1,2:3}^{\text{VIII}}|\bm{Y})\approx 0.42$ .

5.2 Case Study: Two-Way Unreplicated Layouts

Franck et al., (2013) and Franck and Osborne, (2016) examine a two-way unreplicated layout describing the copy number variation for genes in six dogs with lymphoma. Samples were taken from healthy and diseased tissue within each dog. It is clear in the plot that dogs behave differently by group: dogs 1, 2, and 5 appear to behave distinctly from dogs 3, 4, and 6. With six subjects by row, there are $1+3\cdot(2^{6-1}-6-1)=76$ candidate models, including the null model and three classes each containing 25 models. Thus we show only the six most probable models, which account for 98.1% of the posterior probability; these results are shown in Figure 6. We conclude with high probability that subjects 1, 2, and 5 behave distinctly from 3, 4, and 6; this scheme over all classes has probability $P(m_{1,2,5:3,4,6}|\bm{Y})\approx.9680$ . More specifically, we also conclude homoscedastic behavior along with this grouping scheme, the group-by-column interaction model, with $P(m_{1,2,5:3,4,6}|\bm{Y})\approx.9040$ .

6 Discussion

Our proposed method is a flexible and intuitive approach to accommodate linear models with latent group-based effects underlying the data. This method generalizes the homoscedastic, two-way unreplicated layout work of Franck, (2018) to a heteroscedastic approach to any linear model with a categorical predictor. By partitioning the data according to the levels of a categorical predictor, we can detect latent structures within the data that might influence the regression effects, error variance, or interaction effects. Often, these group-based structures have straightforward and intuitive interpretations in the context of the data, making the approach particularly useful to domain experts. The use of fractional Bayes factors allows us to compare homoscedastic and heteroscedastic models with minimal prior influence. Regarding priors on the regression effects, we explored the performance of noninformative and mixture $g$ -priors in the contexts of ANCOVA and two-way unreplicated layouts, but many other choices on priors and data layouts can be considered. We considered cases in which the number of levels of the SLGF $K$ was relatively small, so that a combinatoric search over all possible grouping schemes was computationally feasible. In cases where $K$ is large, a Markov-chain model composition (MC3) could be used to search the model space.

In some cases, our method leads to a contraction of the number of effects modeled. For instance, in the ANCOVA examples illustrated in Section 3.1, estimating a group effect $\tilde{\boldsymbol{\nu}}$ as opposed to a categorical effect $\boldsymbol{\nu}$ will reduce the degrees of freedom used to estimate the effect from $K$ to $2$ . Alternatively, in the two-way unreplicated layout examples, modeling a group-by-column interaction rather than column effects will expand the degrees of freedom used from $C-1$ to $2\cdot(C-1)$ .

While we have illustrated our method using two latent groups, an extension to three or more groups is straightforward conceptually but increases the complexity of the models, the number of model classes, and the number of models to consider. Fortunately, the two-group assumption has been shown to lead to useful inferences in several previous works, including Kharrati-Kopaei and Sadooghi-Alvandi, (2007), Franck et al., (2013), Franck and Osborne, (2016), and Franck, (2018).

7 Online Supplement

7.1 Marginal Model Probability Calculations

To prevent numeric underflow we consider model probabilites on the logarithmic scale in all cases. Consider the set of all log-marginal probabilities $\mathcal{L}=\{\log P(Y|m_{s}^{c})\}$ . Let $\ell^{\star}=\max\mathcal{L}$ and $m^{\star}=\underset{m}{\arg\max}\ \mathcal{L}$ . Transform $\mathcal{L}^{\star}=\mathcal{L}-\ell^{\star}$ to obtain the set $\exp[\mathcal{L}^{\star}]=\{\exp[\log P(Y|m_{s}^{c})-\log P(Y|m^{\star})]\}=\{\frac{P(Y|m_{s}^{c})}{P(Y|m^{\star})}\}=\{B^{m_{s}^{c},\,m^{\star}}\}$ , representing Bayes factors for each model relative to the most probable model. Note the untransformed model probabilities are given by $P=\{\frac{P(Y|m_{s}^{c})P(m_{s}^{c})}{\sum P(Y|m_{s}^{c})P(m_{s}^{c})}\}=\{\frac{\nicefrac{{P(Y|m_{s}^{c})P(m_{s}^{c})}}{{P(Y|m^{\star})}}}{\nicefrac{{\sum P(Y|m_{s}^{c})P(m_{s}^{c})}}{{P(Y|m^{\star})}}}\}=\{\frac{B^{m_{s}^{c},\,m^{\star}}P(m_{s}^{c})}{\sum B^{m_{s}^{c},\,m^{\star}}P(m_{s}^{c})}\}$ ; thus we can easily use these transformed log-marginal probabilities to obtain posterior model probabilities. Our Laplace approximations generally provided values comparable to quadrature and cubature-based approximations at less computational expense.

7.2 Derivation of Model Probabilities: Noninformative Regression Effect Priors

Let $H_{X}:=X(X^{T}X)^{-1}X^{T}$ .

7.2.1 Homoscedastic Error Variance, No Group-Based Regression Effects

[TABLE]

7.2.2 Homoscedastic Error Variance, Group-Based Regression Effects

[TABLE]

7.2.3 Heteroscedastic Error Variance

A Laplace approximation is used to evaluate $\int P(\bm{Y}|\boldsymbol{\varphi},m)P(\boldsymbol{\varphi})d\boldsymbol{\varphi}$ , parametrized with respect to the log-variances $\lambda_{1}=\log{\sigma^{2}_{1}}$ and $\lambda_{2}=\log{\sigma^{2}_{2}}$ . Denote $\Lambda$ as the log-variance matrix, $J_{\Lambda}$ as the transformation Jacobian, $(\lambda_{1}^{\star},\,\lambda_{2}^{\star})$ as the mode of the log-marginal distribution, and $\nabla^{\star}$ as the Hessian evaluated at this mode. A subscript of $b$ refers to the same quantities calculated with respect to the fractional exponentiated likelihood. The joint modes $(\lambda_{1}^{\star},\,\lambda_{2}^{\star})$ and $(\lambda_{b_{1}}^{\star},\,\lambda_{b_{2}}^{\star})$ are computed using the function optim in R; similarly, the Hessians $\nabla^{\star}$ and $\nabla_{b}^{\star}$ are evaluated at these values using the function hessian in the package numderiv (Gilbert and Varadhan,, 2016; R Core Team,, 2017).

We first integrate over the regression effects vector $\boldsymbol{\beta}$ . Let $H_{\Phi}:=\Phi X(X^{T}\Phi X)^{-1}X^{T}\Phi$ .

[TABLE]

We reparametrize the precisions of $P^{b}(\bm{Y}|m_{s}^{\text{III}},\varphi_{1},\varphi_{2})$ and $P(\bm{Y}|m_{s}^{\text{III}},\varphi_{1},\varphi_{2})$ to log-variances $\lambda_{1}$ and $\lambda_{2}$ to elicit a shape more conducive to the Laplace approximation.

For $\lambda_{1}=\ln{\varphi_{1}^{-1}}$ and $\lambda_{2}=\ln{\varphi_{2}^{-1}}$ , $P(\bm{Y}|m_{s}^{\text{III}},\lambda_{1},\lambda_{2})=P(\bm{Y}|m_{s}^{\text{III}},\varphi_{1},\varphi_{2})\cdot|J_{\Lambda}|$ where $\frac{\partial\varphi_{1}}{\partial\lambda_{1}}=-\exp\{-\lambda_{1}\}$ and $\frac{\partial\varphi_{2}}{\partial\lambda_{2}}=-\exp\{-\lambda_{2}\}$ , so $|J_{\Lambda}|=\exp\{-(\lambda_{1}+\lambda_{2})\}$ .

Thus the Laplace approximation for the fractional marginal model probability is given by

[TABLE]

Empirical comparisons with cubature-based approximations, in addition to the shapes of the marginal densities $P(\bm{Y}|m_{s}^{\text{III}},\lambda_{1},\lambda_{2})$ and $P^{b}(\bm{Y}|m_{s}^{\text{III}},\lambda_{1},\lambda_{2})$ , indicate that this Laplace approximation is satisfactory.

7.2.4 Special Case: Heteroscedastic Error Variance with no Regression Effects Spanning Two Variances

[TABLE]

7.3 Derivation of Model Probabilities: Mixture $\bm{g}$ Regression Effect Priors

[TABLE]

After integrating over $\alpha,\,\boldsymbol{\beta}$ , and $\boldsymbol{\varphi}$ , we obtain

[TABLE]

To execute the Laplace approximation over $g$ , we must obtain the mode $g^{\star}=\underset{g}{\text{argmax}}\ P(\bm{Y},g|m_{s}^{c})$ , the root of the equation

[TABLE]

where $Q=1-R^{2}$ . We also require the Hessian evaluated at the mode,

[TABLE]

These expressions are appropriate to use in homoscedastic classes with either global or distinct regression effects; we simply compute the corresponding $R^{2}$ and use the appropriate $P$ based on the model under consideration.

In classes with heteroscedasticity, the integral is intractable over both $\boldsymbol{\varphi}$ and $g$ ; thus we must employ a three-dimensional Laplace approximation to evaluate this integral. For computational ease and to improve the accuracy of the approximation, we again parametrize with respect to the log-variance; let $\Lambda$ represent the log-variance matrix. Denote $H_{\Lambda}=\Lambda\bm{1}(\bm{1}^{T}\Lambda\bm{1})^{-1}\bm{1}^{T}\Lambda$ ; then integrating out the global intercept and regression effects yields an expression for $P^{b}(\bm{Y},\lambda_{1},\lambda_{2},g|m_{s}^{c})$ :

[TABLE]

The joint mode $(\lambda_{1}^{\star},\,\lambda_{2}^{\star},\,g^{\star})$ is computed using the function optim in R; similarly, the Hessian is computed at this value using the function hessian in the package numderiv.

7.3.1 Heteroscedastic Zellner-Siow Cauchy Result

Let $\boldsymbol{\beta}|\Phi,\,g\sim N(\bm{0},\,g(X^{T}\Phi X)^{-1})$ and $g\sim\text{IG}(\frac{1}{2},\,\frac{N}{2})$ . Then $P(\boldsymbol{\beta}|g,\,\Phi)=(2\pi)^{-\frac{P}{2}}|g(X^{T}\Phi X)^{-1}|^{-\frac{1}{2}}\exp\{-\frac{1}{2}\boldsymbol{\beta}\frac{X^{T}\Phi X}{g}\boldsymbol{\beta}\}$ and $P(g)=\frac{\left(\frac{N}{2}\right)^{\frac{1}{2}}}{\Gamma\left(\frac{1}{2}\right)}g^{-\frac{3}{2}}\exp\{-\frac{N}{2g}\}$ .

[TABLE]

7.4 Supplemental ANCOVA Simulation Study

We let $n_{k}:=10$ for each level of the SLGF. The parameter settings are provided in Table 1.

7.5 Supplemental Two-way Layout Simulation Study

We provide three additional simulation studies in the twoway layout scenario: $10\times 5$ layouts with a smaller effect size than the study provided in Section 4.2, as well as $5\times 5$ studies with larger and smaller effect sizes.

Bibliography51

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bartlett, (1937) Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Statistical Society, Series A , 160(901):268–282.
2Berger and Pericchi, (1996) Berger, J. O. and Pericchi, L. R. (1996). The intrinsic bayes factor for model selection and prediction. Journal of the American Statistical Association , 91(433):109–122.
3Berry, (1993) Berry, W. D. (1993). Quantitative Applications in the Social Sciences: Understanding Regression Assumptions . SAGE Publications Ltd., Thousand Oaks, CA.
4Bickel, (1978) Bickel, P. J. (1978). Using residuals robustly i: Tests for heteroscedasticity, nonlinearity. Annals of Statistics , 6:266–291.
5Boscardin and Gelman, (1994) Boscardin, W. J. and Gelman, A. (1994). Bayesian computation for parametric models of heteroscedasticity in the linear model.
6Box and Cox, (1964) Box, G. E. and Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society , 26:211–252.
7Box and Hill, (1974) Box, G. E. P. and Hill, W. J. (1974). Correcting inhomogeneity of variance with power transformation weighting. Technometrics , 16:385–389.
8Breusch and Pagan, (1979) Breusch, T. S. and Pagan, A. R. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica (pre-1986) , 47(5):1287. Copyright - Copyright Econometric Society Sep 1979; Last updated - 2011-10-07; CODEN - ECMTA 7.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Detection of latent heteroscedasticity and group-based regression effects in linear models via Bayesian model selection

Abstract

1 Introduction

2 Proposed Method

2.1 Specification of Linear Models with Categorical Predictors

2.2 Bayesian Model Selection Details

2.2.1 Model Specification

2.2.2 Model Priors for Classes and Schemes

2.2.3 Fractional Bayes Factor Approach

3 Applications

3.1 Application 1: ANCOVA Models

3.2 Application 2: Unreplicated Two-Way Layouts

3.2.1 Bayesian Model Specification: Unreplicated Two-Way Layouts

4 Simulation Studies

4.1 Simulation Study: ANCOVA Models

4.2 Simulation Study: Unreplicated Two-Way Layouts

5 Case Studies

5.1 Case Study: ANCOVA

5.2 Case Study: Two-Way Unreplicated Layouts

6 Discussion

7 Online Supplement

7.1 Marginal Model Probability Calculations

7.2 Derivation of Model Probabilities: Noninformative Regression Effect Priors

7.2.1 Homoscedastic Error Variance, No Group-Based Regression Effects

7.2.2 Homoscedastic Error Variance, Group-Based Regression Effects

7.2.3 Heteroscedastic Error Variance

7.2.4 Special Case: Heteroscedastic Error Variance with no Regression Effects Spanning Two Variances

7.3 Derivation of Model Probabilities: Mixture g\bm{g}g Regression Effect Priors

7.3.1 Heteroscedastic Zellner-Siow Cauchy Result

7.4 Supplemental ANCOVA Simulation Study

7.5 Supplemental Two-way Layout Simulation Study

7.3 Derivation of Model Probabilities: Mixture $\bm{g}$ Regression Effect Priors