Posterior Robustness with Milder Conditions: Contamination Models Revisited

Yasuyuki Hamura; Kaoru Irie; Shonosuke Sugasawa

arXiv:2303.00281·stat.ME·September 23, 2025

Posterior Robustness with Milder Conditions: Contamination Models Revisited

Yasuyuki Hamura, Kaoru Irie, Shonosuke Sugasawa

PDF

Open Access

TL;DR

This paper revisits classical contamination models in robust Bayesian linear regression, providing new conditions for posterior robustness and demonstrating that even Student-t errors can achieve robustness under milder assumptions.

Contribution

It introduces new sufficient conditions for posterior robustness in contamination models, expanding the class of error distributions that ensure robustness.

Findings

01

Student-t errors can achieve posterior robustness.

02

New conditions for robustness are less restrictive.

03

Numerical study confirms robustness with outliers.

Abstract

Robust Bayesian linear regression is a classical but essential statistical tool. Although novel robustness properties of posterior distributions have been proved recently under a certain class of error distributions, their sufficient conditions are restrictive and exclude several important situations. In this work, we revisit a classical two-component mixture model for response variables, also known as contamination model, where one component is a light-tailed regression model and the other component is heavy-tailed. The latter component is independent of the regression parameters, which is crucial in proving the posterior robustness. We obtain new sufficient conditions for posterior (non-)robustness and reveal non-trivial robustness results by using those conditions. In particular, we find that even the Student- $t$ error distribution can achieve the posterior robustness in our…

Tables2

Table 1. Table 1: Priors and conditions in Theorem 1 and 2

$IG (A, B)$	$(1 / σ^{2 A + 1}) \exp (- B / σ^{2})$	$2 A > \| ℒ \| α$	$2 A < \| ℒ \| α$
Prior for $σ^{2}$	Density $π (σ) d σ$	Condition (5)	Condition (6)
Prior for $σ^{2}$	Density $π (σ) d σ$	for robustness	for non-robustness
Inverse-gamma:	$(1 / σ^{2 A + 1}) \exp (- B / σ^{2})$	$2 A > \| ℒ \| α$	$2 A < \| ℒ \| α$
Gamma: $Ga (C, D)$	$σ^{2 C - 1} \exp (- D σ^{2})$	✓	NA
Scaled-beta:	$σ^{2 E - 1} / {(1 + σ^{2})}^{E + F}$	$2 F > \| ℒ \| α$	$2 F < \| ℒ \| α$
$SB (E, F)$	$σ^{2 E - 1} / {(1 + σ^{2})}^{E + F}$	$2 F > \| ℒ \| α$	$2 F < \| ℒ \| α$

Table 2. Table 2: Sufficient conditions of model components for robustness

	Number of	Error density	Prior density $π (𝜷, σ)$
	outliers $\| ℒ \|$	tails ( $f$ or $f_{1}$ )	Density bounds	Moments	Improper
Gagnon et al.	$\| 𝒦 \| \geq \| ℒ \| + 2 p - 1$	LRVD	$\max {1, 1 / σ}$	–	✓
(2019)	$\| 𝒦 \| \geq \| ℒ \| + 2 p - 1$	LRVD	$\max {1, 1 / σ}$	–	✓
Hamura et al.	$\| 𝒦 \| \geq \| ℒ \| + p$	LRVD	$sup_{t \in ℝ} \| t \| π_{β} (t) < \infty$	$E [σ^{- n}] < \infty$	NA
(2020)	$\| 𝒦 \| \geq \| ℒ \| + p$	LRVD	$sup_{t \in ℝ} \| t \| π_{β} (t) < \infty$	$E [σ^{- n}] < \infty$	NA
Theorem 1	Not needed	$\frac{1}{{(1 + \| y \|)}^{1 + α}}$	$\prod_{k = 1}^{p} \frac{{(\| β_{k} \| / σ)}^{κ - 1} / σ}{{(1 + \| β_{k} \| / σ)}^{κ + ν}}$	$E [σ^{\| ℒ \| α + ρ}] < \infty$	NA
of this study	Not needed	$\frac{1}{{(1 + \| y \|)}^{1 + α}}$		$E [σ^{\| ℒ \| α + ρ}] < \infty$	NA

Equations178

y_{i} \sim f ((y_{i} - \boldmath x_{i}^{⊤} \boldmath β) / σ) / σ, i = 1, \dots, n,

y_{i} \sim f ((y_{i} - \boldmath x_{i}^{⊤} \boldmath β) / σ) / σ, i = 1, \dots, n,

y_{i} \sim (1 - s) f_{0} ((y_{i} - \boldmath x_{i}^{⊤} \boldmath β) / σ) / σ + s f_{1} (y_{i}), i = 1, \dots, n,

y_{i} \sim (1 - s) f_{0} ((y_{i} - \boldmath x_{i}^{⊤} \boldmath β) / σ) / σ + s f_{1} (y_{i}), i = 1, \dots, n,

p (\boldmath β, σ ∣ \boldmath y)

p (\boldmath β, σ ∣ \boldmath y)

\to π (\boldmath β, σ) σ^{α} i = 2 \prod n \frac{f (( y _{i} - \boldmath x _{i} ^{⊤} \boldmath β ) / σ )}{σ}

p (\boldmath β, σ ∣ \boldmath y)

p (\boldmath β, σ ∣ \boldmath y)

\displaystyle\to\pi({\text{\boldmath$\beta$}},{\sigma})\prod_{i=2}^{n}\Big{\{}(1-s){f_{0}((y_{i}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}})/{\sigma})\over{\sigma}}+sf_{1}(y_{i})\Big{\}}{}

y_{i} \sim (1 - s) N (y_{i} ∣ \boldmath x_{i}^{⊤} \boldmath β, σ^{2}) + s f_{1} (y_{i})

y_{i} \sim (1 - s) N (y_{i} ∣ \boldmath x_{i}^{⊤} \boldmath β, σ^{2}) + s f_{1} (y_{i})

\displaystyle\pi({\text{\boldmath$\beta$}}|{\sigma})={\pi({\text{\boldmath$\beta$}},{\sigma})\over\pi({\sigma})}\leq M\prod_{k=1}^{p}\Big{\{}{1\over{\sigma}}{(|{\beta}_{k}|/{\sigma})^{{\kappa}-1}\over(1+|{\beta}_{k}|/{\sigma})^{{\kappa}+\nu}}\Big{\}},

\displaystyle\pi({\text{\boldmath$\beta$}}|{\sigma})={\pi({\text{\boldmath$\beta$}},{\sigma})\over\pi({\sigma})}\leq M\prod_{k=1}^{p}\Big{\{}{1\over{\sigma}}{(|{\beta}_{k}|/{\sigma})^{{\kappa}-1}\over(1+|{\beta}_{k}|/{\sigma})^{{\kappa}+\nu}}\Big{\}},

f_{1} (y)

f_{1} (y)

E [σ^{∣ L ∣ α + ρ}] < \infty

E [σ^{∣ L ∣ α + ρ}] < \infty

ω \to \infty lim p (\boldmath β, σ ∣ \boldmath y) = p (\boldmath β, σ ∣ \boldmath y_{K})

ω \to \infty lim p (\boldmath β, σ ∣ \boldmath y) = p (\boldmath β, σ ∣ \boldmath y_{K})

f_{1} (y)

f_{1} (y)

π (σ) \geq (1/ M) / σ^{∣ L ∣ α + 1 - ρ}

π (σ) \geq (1/ M) / σ^{∣ L ∣ α + 1 - ρ}

ω \to \infty lim p (\boldmath β, σ ∣ \boldmath y) = 0

ω \to \infty lim p (\boldmath β, σ ∣ \boldmath y) = 0

KL = \int_{R^{p} \times (0, \infty)^{p}} p (\boldmath β, σ ∣ \boldmath y_{K}) lo g \frac{p ( \boldmath β , σ ∣ \boldmath y _{K} )}{p ( \boldmath β , σ ∣ \boldmath y )} d (\boldmath β, σ) .

KL = \int_{R^{p} \times (0, \infty)^{p}} p (\boldmath β, σ ∣ \boldmath y_{K}) lo g \frac{p ( \boldmath β , σ ∣ \boldmath y _{K} )}{p ( \boldmath β , σ ∣ \boldmath y )} d (\boldmath β, σ) .

π (\boldmath β, σ)

π (\boldmath β, σ)

f_{1}^{light} (y) = \frac{α /2}{( 1 + ∣ y ∣ ) ^{1 + α}}, y \in R,

f_{1}^{light} (y) = \frac{α /2}{( 1 + ∣ y ∣ ) ^{1 + α}}, y \in R,

f_{1}^{heavy} (y) = \frac{γ /2}{1 + ∣ y ∣} \frac{1}{{ 1 + lo g ( 1 + ∣ y ∣ ) } ^{1 + γ}}, y \in R,

\boldmath x_{1}^{⊤} ⋮ \boldmath x_{5}^{⊤}

\boldmath x_{1}^{⊤} ⋮ \boldmath x_{5}^{⊤}

R^{p}

R^{p}

\displaystyle\quad\cup\bigcup_{l=1}^{p}\bigcup_{1\leq i_{1}<\dots<i_{l}\leq m}\Big{(}\Big{(}\bigcap_{i\in\{i_{1},\dots,i_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}||a_{i}+b_{i}{\omega}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}|\leq{\varepsilon}{\omega}\}\Big{)}{}

\displaystyle\quad\cap\Big{(}\bigcap_{i\in\{1,\dots,m\}\setminus\{i_{1},\dots,i_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}||a_{i}+b_{i}{\omega}-{{\text{\boldmath$x$}}_{i}}^{\top}{\text{\boldmath$\beta$}}|>{\varepsilon}{\omega}\}\Big{)}{}

\displaystyle\quad\cap\Big{\{}\bigcup_{1\leq k_{1}<\dots<k_{l}\leq p}\bigcap_{k\in\{k_{1},\dots,k_{l}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}||({\text{\boldmath$e$}}_{k}^{(p)})^{\top}{\text{\boldmath$\beta$}}|\geq{\delta}{\omega}\}\Big{\}}{}

\displaystyle\quad\cap\Big{[}\Big{(}\bigcap_{j\in{\widehat{J}}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}||c_{j}-{{\text{\boldmath$z$}}_{j}}^{\top}{\text{\boldmath$\beta$}}|>\eta\}\Big{)}{}

\displaystyle\quad\cup\bigcup_{1\leq q\leq p-l}\bigcup_{\begin{subarray}{c}j_{1},\dots,j_{q}\in{\widehat{J}}\\ j_{1}<\dots<j_{q}\end{subarray}}\Big{\{}\Big{(}\bigcap_{j\in\{j_{1},\dots,j_{q}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}||c_{j}-{{\text{\boldmath$z$}}_{j}}^{\top}{\text{\boldmath$\beta$}}|\leq\eta\}\Big{)}{}

\displaystyle\quad\cap\Big{(}\bigcap_{j\in{\widehat{J}}\setminus\{j_{1},\dots,j_{q}\}}\{{\text{\boldmath$\beta$}}\in\mathbb{R}^{p}||c_{j}-{{\text{\boldmath$z$}}_{j}}^{\top}{\text{\boldmath$\beta$}}|>\eta\}\Big{)}\Big{\}}\Big{]}\Big{)}\text{.}{}

∣ a_{i} + b_{i} ω - \boldmath x_{i}^{⊤} \boldmath β ∣ \leq ε ω if and only if \boldmath x_{i}^{⊤} \boldmath β / ω \in [a_{i} / ω + b_{i} \pm ε] .

∣ a_{i} + b_{i} ω - \boldmath x_{i}^{⊤} \boldmath β ∣ \leq ε ω if and only if \boldmath x_{i}^{⊤} \boldmath β / ω \in [a_{i} / ω + b_{i} \pm ε] .

(1 \leq i \leq m max ∥ \boldmath x_{i} ∥) (p 1 \leq k \leq p max ∣ β_{k} ∣) / ω \geq ∣ a_{i} / ω + b_{i} + s ∣ \geq ∣ b_{i} ∣ - ∣ a_{i} ∣/ ω - ε

(1 \leq i \leq m max ∥ \boldmath x_{i} ∥) (p 1 \leq k \leq p max ∣ β_{k} ∣) / ω \geq ∣ a_{i} / ω + b_{i} + s ∣ \geq ∣ b_{i} ∣ - ∣ a_{i} ∣/ ω - ε

(\boldmath X (\boldmath i, \boldmath k) {\tilde{\boldmath x} (\boldmath i, \boldmath k)}^{⊤} \boldmath X (\boldmath i, \boldmath k) {\tilde{\tilde{\boldmath x}} (\boldmath i, \boldmath k)}^{⊤}) = \boldmath x_{i_{1}}^{⊤} ⋮ \boldmath x_{i_{l^{'} + 1}}^{⊤} {(\boldmath e_{k_{1}}^{(p)}, \dots, \boldmath e_{k_{l^{'}}}^{(p)}, \boldmath e_{\tilde{k}_{1} (\boldmath k)}^{(p)}, \dots, \boldmath e_{\tilde{k}_{p - l^{'}} (\boldmath k)}^{(p)})^{⊤}}^{- 1} and \boldmath X (\boldmath i, \boldmath k) \in R^{l^{'} \times l^{'}} .

(\boldmath X (\boldmath i, \boldmath k) {\tilde{\boldmath x} (\boldmath i, \boldmath k)}^{⊤} \boldmath X (\boldmath i, \boldmath k) {\tilde{\tilde{\boldmath x}} (\boldmath i, \boldmath k)}^{⊤}) = \boldmath x_{i_{1}}^{⊤} ⋮ \boldmath x_{i_{l^{'} + 1}}^{⊤} {(\boldmath e_{k_{1}}^{(p)}, \dots, \boldmath e_{k_{l^{'}}}^{(p)}, \boldmath e_{\tilde{k}_{1} (\boldmath k)}^{(p)}, \dots, \boldmath e_{\tilde{k}_{p - l^{'}} (\boldmath k)}^{(p)})^{⊤}}^{- 1} and \boldmath X (\boldmath i, \boldmath k) \in R^{l^{'} \times l^{'}} .

\displaystyle M=\max\Big{\{}M^{\prime},\max_{({\text{\boldmath$i$}},{\text{\boldmath$k$}})\in{\cal I}\times{\cal K}}3{|(-\{{\tilde{\text{\boldmath$x$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{\top}\{{\widetilde{\text{\boldmath$X$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{-1},1){\text{\boldmath$a$}}({\text{\boldmath$i$}})|\over|(-\{{\tilde{\text{\boldmath$x$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{\top}\{{\widetilde{\text{\boldmath$X$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{-1},1){\text{\boldmath$b$}}({\text{\boldmath$i$}})|}\Big{\}}\text{,}{}

\displaystyle M=\max\Big{\{}M^{\prime},\max_{({\text{\boldmath$i$}},{\text{\boldmath$k$}})\in{\cal I}\times{\cal K}}3{|(-\{{\tilde{\text{\boldmath$x$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{\top}\{{\widetilde{\text{\boldmath$X$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{-1},1){\text{\boldmath$a$}}({\text{\boldmath$i$}})|\over|(-\{{\tilde{\text{\boldmath$x$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{\top}\{{\widetilde{\text{\boldmath$X$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{-1},1){\text{\boldmath$b$}}({\text{\boldmath$i$}})|}\Big{\}}\text{,}{}

\displaystyle{\varepsilon}=\min\Big{\{}{{\varepsilon}}^{\prime},\min_{({\text{\boldmath$i$}},{\text{\boldmath$k$}})\in{\cal I}\times{\cal K}}{1\over 3}{1\over\sqrt{p+1}}{|(-\{{\tilde{\text{\boldmath$x$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{\top}\{{\widetilde{\text{\boldmath$X$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{-1},1){\text{\boldmath$b$}}({\text{\boldmath$i$}})|\over\|(-\{{\tilde{\text{\boldmath$x$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{\top}\{{\widetilde{\text{\boldmath$X$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{-1},1)\|}\Big{\}}\text{,}\quad\text{and}{}

\displaystyle{\delta}=\min\Big{\{}{{\delta}}^{\prime},\min_{({\text{\boldmath$i$}},{\text{\boldmath$k$}})\in{\cal I}\times{\cal K}}{1\over 3}{1\over\sqrt{p}}{|(-\{{\tilde{\text{\boldmath$x$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{\top}\{{\widetilde{\text{\boldmath$X$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{-1},1){\text{\boldmath$b$}}({\text{\boldmath$i$}})|\over\|-\{{\tilde{\text{\boldmath$x$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{\top}\{{\widetilde{\text{\boldmath$X$}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{-1}\widetilde{{\widetilde{\text{\boldmath$X$}}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})+\{\tilde{{\tilde{\text{\boldmath$x$}}}}({\text{\boldmath$i$}},{\text{\boldmath$k$}})\}^{\top}\|}\Big{\}}\text{.}{}

\boldmath X \tilde{\boldmath x}^{⊤} \boldmath X \tilde{\tilde{\boldmath x}}^{⊤} \boldmath E \boldmath β / ω = \boldmath a / ω + \boldmath b + \boldmath s,

\boldmath X \tilde{\boldmath x}^{⊤} \boldmath X \tilde{\tilde{\boldmath x}}^{⊤} \boldmath E \boldmath β / ω = \boldmath a / ω + \boldmath b + \boldmath s,

\boldmath X \tilde{\boldmath x}^{⊤} \boldmath X \tilde{\tilde{\boldmath x}}^{⊤} = \boldmath x_{i_{1}}^{⊤} ⋮ \boldmath x_{i_{l^{'} + 1}}^{⊤} \boldmath E^{- 1} and \boldmath X \in R^{l^{'} \times l^{'}}

\boldmath X \tilde{\boldmath x}^{⊤} \boldmath X \tilde{\tilde{\boldmath x}}^{⊤} = \boldmath x_{i_{1}}^{⊤} ⋮ \boldmath x_{i_{l^{'} + 1}}^{⊤} \boldmath E^{- 1} and \boldmath X \in R^{l^{'} \times l^{'}}

\boldmath I^{(l^{'})} (0^{(l^{'})})^{⊤} \boldmath X^{- 1} \boldmath X - \tilde{\boldmath x}^{⊤} \boldmath X^{- 1} \boldmath X + \tilde{\tilde{\boldmath x}}^{⊤} \boldmath E \boldmath β / ω = (\boldmath X^{- 1} - \tilde{\boldmath x}^{⊤} \boldmath X^{- 1} 0^{(l^{'})} 1) (\boldmath a / ω + \boldmath b + \boldmath s) .

\boldmath I^{(l^{'})} (0^{(l^{'})})^{⊤} \boldmath X^{- 1} \boldmath X - \tilde{\boldmath x}^{⊤} \boldmath X^{- 1} \boldmath X + \tilde{\tilde{\boldmath x}}^{⊤} \boldmath E \boldmath β / ω = (\boldmath X^{- 1} - \tilde{\boldmath x}^{⊤} \boldmath X^{- 1} 0^{(l^{'})} 1) (\boldmath a / ω + \boldmath b + \boldmath s) .

0

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Statistical Methods and Inference

Full text

Posterior Robustness with Milder Conditions: Contamination Models Revisited

Yasuyuki Hamura111Corresponding author. Graduate School of Economics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, JAPAN.

E-Mail: [email protected], Kaoru Irie222Faculty of Economics, The University of Tokyo.

E-Mail: [email protected] and Shonosuke Sugasawa333Center for Spatial Information Science, The University of Tokyo.

E-Mail: [email protected]

Abstract

Robust Bayesian linear regression is a classical but essential statistical tool. Although novel robustness properties of posterior distributions have been proved recently under a certain class of error distributions, their sufficient conditions are restrictive and exclude several important situations. In this work, we revisit a classical two-component mixture model for response variables, also known as contamination model, where one component is a light-tailed regression model and the other component is heavy-tailed. The latter component is independent of the regression parameters, which is crucial in proving the posterior robustness. We obtain new sufficient conditions for posterior (non-)robustness and reveal non-trivial robustness results by using those conditions. In particular, we find that even the Student- $t$ error distribution can achieve the posterior robustness in our framework. A numerical study is performed to check the Kullback-Leibler divergence between the posterior distribution based on full data and that based on data obtained by removing outliers.

Keywords: heavy-tailed distribution; posterior robustness; two-component mixture

Introduction

Bayesian posterior robustness (O’Hagan, 1979) and related topics have long been studied (e.g., West, 1984; Andrade and O’Hagan, 2006, 2011; O’Hagan and Pericchi, 2012). There, one of the most important objectives is to perform posterior analysis using moderate observations only and discarding outliers that are not related to the parameters of interest. Because the task of manually detecting or determining outliers is difficult in general, robust models are desired under which the effects of outliers are automatically removed.

Although many robust regression models have been proposed in the literature, few works (e.g., O’Hagan, 1979) have given theoretical justifications to those models. In fact, it is only recently that Desgagné (2013, 2015) and Gagnon et al. (2019) have proved posterior robustness for scale, location-scale, and regression models, respectively. Here, posterior densities are said to be robust if they converge to the corresponding conditional densities of parameters based only on non-outliers as the absolute values of outliers tend to infinity. Since then, posterior robustness has been established in various practically important settings; Hamura et al. (2022) obtained robustness results for regressions with shrinkage priors, whereas Hamura et al. (2021) considered a case of integer-valued observation.

In proving the posterior robustness, Gagnon et al. (2019) and Hamura et al. (2022) considered the following model; with observations $y_{1},\dots,y_{n}$ , $p$ -dimensional covariate vectors ${\text{\boldmath$ x $}}_{1},\dots,{\text{\boldmath$ x $}}_{n}$ , regression coefficients ${\text{\boldmath$ \beta $}}\in\mathbb{R}^{p}$ and a scale parameter ${\sigma}\in(0,\infty)$ , they assume

[TABLE]

for some error density $f$ and prior $({\text{\boldmath$ \beta $}},{\sigma})\sim\pi({\text{\boldmath$ \beta $}},{\sigma})$ . In their proof, it is crucial to assume the log-regularly varying error density, which has tails heavier than the Student’s $t$ -distributions, to ensure posterior robustness. If $f$ is the Student’s $t$ -distribution, the posterior is not robust (Gagnon and Hayashi, 2023). These theoretical findings imply the superiority of log-regularly varying error density to the Student’s $t$ -distributions. However, it has also been reported that the Student’s $t$ -error distribution is fairly competitive in posterior inference in several numerical studies (Hamura et al., 2022).

In this paper, we revisit the following classical two-component mixture regression model, also known as the contamination model:

[TABLE]

where $({\text{\boldmath$ \beta $}},{\sigma})\sim\pi({\text{\boldmath$ \beta $}},{\sigma})$ and $s\in(0,1)$ is a prior probability that an observation becomes an outlier. The first density, $f_{0}$ , has thinner tails and is typically the standard normal distribution. The second density, $f_{1}$ , is a heavy-tailed distribution, such as Student’s $t$ -distribution, and expected to accommodate outliers. One notable feature of the above model is that the second term is completely independent of the parameters $({\text{\boldmath$ \beta $}},{\sigma})$ . This is a significant difference from the classical two-component mixtures in Box and Tiao (1968) and subsequent research (Tak et al., 2019; Silva et al., 2020), where the second component is also scaled by observational standard error $\sigma$ .

Under the model (2), we show that the posterior is robust if $\pi({\sigma})$ , the marginal prior for ${\sigma}$ , has tails sufficiently lighter than those of the error density $f_{1}$ . When $f_{1}$ is log-regularly varying, then most of prior distributions can satisfy this sufficient condition for robustness. Furthermore, we prove that the sufficient condition on the tails of $\pi({\sigma})$ is “nearly” necessary as well; if the error distribution is not log-regularly varying and has lighter tails than $\pi({\sigma})$ , then the posterior is not robust. With these conditions, we can identify the posterior (non)-robustness for most of the error and prior distributions used in the regression models.

Our result can also explain the gap between the non-robustness of the Student $t$ -distribution in model (1) and its success in posterior inference in numerical studies. For simplicity, assume that only the first observation, $y_{1}$ , is outlying and let $|y_{1}|\to\infty$ . Then, under the model (1) with $f(y)\propto|y|^{-1-{\alpha}}$ as $|y|\to\infty$ for ${\alpha}>0$ (Student’s $t$ -distribution with ${\alpha}$ degree-of-freedom), it holds that

[TABLE]

as $|y_{1}|\to\infty$ . This limit is the product of the posterior density without $y_{1}$ and factor ${\sigma}^{{\alpha}}$ . In other words, the Student’s $t$ -distribution can never achieve the posterior robustness. By contrast, under the model (2), we have

[TABLE]

as $|y_{1}|\to\infty$ , provided that $f_{1}$ has sufficiently heavier tails than $f_{0}$ . This is precisely the posterior without $y_{1}$ , for which we confirm the posterior robustness. Here, the main difference from the model (1) is that the second component $f_{1}$ of (2) does not involve the parameters $({\text{\boldmath$ \beta $}},\sigma)$ of the first component. Thanks to this difference, outliers are not linked to any of parameters $({\text{\boldmath$ \beta $}},{\sigma})$ in this model and therefore have no effects on the joint posterior distribution of $({\text{\boldmath$ \beta $}},{\sigma})$ , as long as $f_{1}$ has heavier tails than $f_{0}$ . This observation applies to the general case of multiple outliers, as will be seen below.

The remainder of this paper is organized as follows. In Section 2, sufficient conditions and necessary conditions for posterior robustness are given. In Section 3, a numerical example is given, in which we see that the Kullback-Leibler divergence between the target and available posteriors can diverge or converge to [math] in some cases. Proofs are given in the Supplementary Material.

Contamination Models and Posterior Robustness

Suppose that we observe

[TABLE]

for $i=1,\dots,n$ , where ${\text{\boldmath$ x $}}_{i}$ are continuous continuous explanatory variables and where ${\text{\boldmath$ \beta $}}=({\beta}_{k})_{k=1}^{p}\in\mathbb{R}^{p}$ and ${\sigma}\in(0,\infty)$ are parameters of interest following a prior distribution $\pi({\text{\boldmath$ \beta $}},{\sigma})$ . Here, $f_{1}(\cdot)$ is an error density, and $s\in(0,1)$ is a prior probability that observation is generated from $f_{1}$ .

Following the work of Desgagné (2015), let ${\cal K},{\cal L}\subset\{1,\dots,n\}$ satisfy ${\cal K}\cup{\cal L}=\{1,\dots,n\}$ , ${\cal K}\cap{\cal L}=\emptyset$ , and ${\cal K},{\cal L}\neq\emptyset$ . Suppose that $a_{i}\in\mathbb{R}$ , $b_{i}\neq 0$ , and $y_{i}=a_{i}+b_{i}{\omega}$ , ${\omega}\to\infty$ , for $i\in{\cal L}$ , such that ${\cal L}$ represents the set of indices of outlying observations. We say that the posterior is robust to outliers under the above model if $p({\text{\boldmath$ \beta $}},{\sigma}|{\text{\boldmath$ y $}})\to p({\text{\boldmath$ \beta $}},{\sigma}|{\text{\boldmath$ y $}}_{{\cal K}})$ as ${\omega}\to\infty$ , where ${\text{\boldmath$ y $}}=(y_{i})_{i=1}^{n}$ , ${\text{\boldmath$ y $}}_{{\cal K}}=(y_{i})_{i\in{\cal K}}$ , and ${\text{\boldmath$ y $}}_{{\cal L}}=(y_{i})_{i\in{\cal L}}$ .

To derive conditions for posterior robustness, we limit the class of prior distributions for $({\text{\boldmath$ \beta $}},{\sigma})$ . Suppose that

[TABLE]

for some $\nu>0$ , $0<{\kappa}\leq 1$ and $M>0$ , where $\pi({\sigma})=\int_{\mathbb{R}^{p}}\pi({\text{\boldmath$ \beta $}},{\sigma})d{\text{\boldmath$ \beta $}}$ . That is, the ratio of the prior density and some double-sided scaled-beta density (with spike at the origin) must be bounded uniformly by some constant. This condition is satisfied by most of the conditionally independent priors that are commonly used in practice. Examples include shrinkage priors, such as the horseshoe prior (Carvalho et al., 2009, 2010), as well as the normal priors. The condition is also satisfied by some multivariate priors for dependent $\beta$ , including the multivariate normal prior.

Likewise, we assume the error distributions, $f_{1}$ , are bounded as

[TABLE]

for some ${\alpha}\geq 0$ , ${\gamma}\geq-1$ and $M^{\prime}>0$ . The class of distributions that satisfy this condition includes Student’s $t$ -distributions ( ${\alpha}>0$ and ${\gamma}=-1$ ) and log-regularly varying distributions ( ${\alpha}=0$ and ${\gamma}>0$ ).

The following theorem gives a sufficient condition for the posterior to be robust.

Theorem 1.

Suppose that conditions (3) and (4) are satisfied for $\nu>{\alpha}$ . Also, suppose that

[TABLE]

for some $\rho>0$ . Then the posterior is robust to outliers under our model; that is, we have

[TABLE]

at each $({\text{\boldmath$ \beta $}},{\sigma})\in\mathbb{R}^{p}\times(0,\infty)$ .

The moment condition for $\pi({\sigma})$ in (5) could be a strong requirement, especially when $\alpha>0$ and multiple outliers are expected. Next, we prove that the posterior robustness does not hold if this moment condition is not satisfied, in addition that the error density tails are not sufficiently heavily tailed.

Theorem 2.

Let $h\colon\mathbb{R}^{p}\to(0,\infty)$ be a probability density and suppose that $\pi({\text{\boldmath$ \beta $}}|{\sigma})=h({\text{\boldmath$ \beta $}}/{\sigma})/{\sigma}^{p}$ . Let ${\alpha}>0$ and suppose that

[TABLE]

for all $y\in\mathbb{R}$ for some $M^{\prime}>0$ . Suppose that

[TABLE]

for all ${\sigma}>1$ for some ${\widetilde{M}}>0$ and $0<\rho<1$ . Then we have

[TABLE]

at each $({\text{\boldmath$ \beta $}},{\sigma})\in\mathbb{R}^{p}\times(0,\infty)$ .

Clearly, under the assumptions of Theorem 2, the posterior does not converge in the usual sense. Indeed, we see in the next section that the Kullback-Leibler divergence between $p({\text{\boldmath$ \beta $}},{\sigma}|{\text{\boldmath$ y $}}_{{\cal K}})$ and $p({\text{\boldmath$ \beta $}},{\sigma}|{\text{\boldmath$ y $}})$ diverges in such a situation.

From Theorems 1 and 2, we can determine whether a prior $\pi({\sigma})$ yields a robust posterior or not in most cases. Suppose that ${\text{\boldmath$ \beta $}}/{\sigma}$ and ${\sigma}$ are independent and that (3) holds. Suppose that equality holds in (4). Then, if we use a gamma prior for ${\sigma}^{2}$ , the moment condition in (5) is always satisfied; hence the posterior is robust regardless of the choice of ${\alpha}$ . If we use an inverse gamma prior or a scaled beta prior for ${\sigma}^{2}$ , either (5) or (6) is satisfied, depending on the hyperparameters. That is, there exists a threshold separating robust and non-robust cases. These observations are summarized in Table 1.

The sufficient conditions obtained in this study differ from those in Gagnon et al. (2019) and Hamura et al. (2022) not only in the model specification given in (1) and (2) but also in the requirement of the error and prior densities. Table 2 summarizes the sufficient conditions for posterior robustness in the literature and Theorem 1. It is worth emphasizing that, in Theorem 1, we do not make any assumption directly on $|{\cal L}|$ , the number of outliers. In proving the posterior robustness, it is inevitable to control the number of the indices $i\in{\cal L}$ for which $|y_{i}-{{\text{\boldmath$ x $}}_{i}}^{\top}{\text{\boldmath$ \beta $}}|$ is outlying and the number of the indices $i\in{\cal K}$ for which $|y_{i}-{{\text{\boldmath$ x $}}_{i}}^{\top}{\text{\boldmath$ \beta $}}|$ is close to [math]. These numbers are further studied in our Lemma 1, which enables the proof of posterior robustness without any assumption on $|{\cal L}|$ . For details, see the Supplemetary Materials. In addition, as pointed out in the introduction, our model allows $f_{1}$ not to be log-regular varying in proving the posterior robustness, which is also significantly different from the settings in the literature. However, we also note that these conditions are not nested in one another. For example, the conditions in Gagnon et al. (2019) cover the improper prior for $({\text{\boldmath$ \beta $}},{\sigma})$ .

Numerical Examples

Here, we numerically calculate the Kullback-Leibler (KL) divergence of the target posterior distribution $p({\text{\boldmath$ \beta $}},{\sigma}|{\text{\boldmath$ y $}}_{{\cal K}})$ from the available posterior distribution $p({\text{\boldmath$ \beta $}},{\sigma}|{\text{\boldmath$ y $}})$ , which is given by

[TABLE]

We use the conjugate normal-inverse gamma prior:

[TABLE]

where $A,B,C>0$ . Under this prior, the posterior becomes a finite mixture of known distributions and analytically and numerically tractable. We consider the following two error densities:

[TABLE]

where ${\alpha},{\gamma}>0$ . The first error distribution, $f_{1}^{\rm{light}}$ , is the double-sided scale-beta distribution, whose tail behavior is equivalent to that of Student’s $t$ -distribution. The second error distribution, $f_{1}^{\rm{heavy}}$ , is the unfolded version of the log-Pareto distribution of Cormann and Reiss (2009).

As an example, we set ${\alpha}=3$ and ${\gamma}=3/2$ , $B=C=1$ , $s=1/10$ , ${\text{\boldmath$ y $}}=(1,2,3,4,{\omega})^{\top}$ , ${\omega}\in\{10^{1},10^{2},10^{3},10^{4},10^{5}\}$ , and

[TABLE]

In this example, $n=5$ , $p=2$ , and ${\cal K}=\{1,\dots,4\}$ and ${\cal L}=\{5\}$ . For the two cases $A=1/10$ and $A=2$ , and for each of the error distributions $f_{1}^{\rm{light}}$ and $f_{1}^{\rm{heavy}}$ , we obtain the Monte Carlo approximation of the KL divergence by using 1,000 samples from the posterior distributions. The results are summarized in Figure 1. It is clearly seen that the KL divergence does not converge to [math] when $f_{1}=f_{1}^{\rm{light}}$ and $A=1/10$ , since the condition of Theorem 2 is satisfied and the posterior is not convergent. In the other three cases, where the sufficient condition of Theorem 1 is satisfied, the KL divergence converges to [math].

Next, under the same setting, Figure 2 shows the posterior and predictive distributions of ${\beta}_{1}+{\tilde{x}}_{2}{\beta}_{2}$ and ${\tilde{y}}\sim(1-s)f_{0}(\{{\tilde{y}}-({\beta}_{1}+{\tilde{x}}_{2}{\beta}_{2})\}/{\sigma})/{\sigma}+sf_{1}({\tilde{y}})$ given $y$ with ${\omega}=10^{2}$ for ${\tilde{x}}_{2}\in\{1.5,1.6,\dots,1.9,2.0\}$ . When $f_{1}=f_{1}^{\rm{light}}$ and $A=1/10$ , the credible intervals become extremely wide since the posterior of $({\text{\boldmath$ \beta $}},{\sigma})$ converges to zero. Comparing the two panels for $f_{1}^{\rm{light}}$ and $f_{1}^{\rm{heavy}}$ when $A=2$ , it can be seen that the predictive intervals are slightly wider for $f_{1}^{\rm{heavy}}$ than for $f_{1}^{\rm{light}}$ , reflecting the difference in the tail behavior of the two error densities.

Acknowledgments

Research of the authors was supported in part by JSPS KAKENHI Grant Number 22K20132, 19K11852, 17K17659, and 21H00699 from Japan Society for the Promotion of Science.

A Basic Lemma

The following result is used in the proof of Theorem 1 of the main text.

Lemma 1.

Let $p\in\mathbb{N}$ and $m,n\in\mathbb{N}$ . Let $a_{i}\in\mathbb{R}$ , $b_{i}\neq 0$ , and ${\text{\boldmath$ x $}}_{i}\in\mathbb{R}^{p}$ , $i=1,\dots,m$ , and $c_{j}\in\mathbb{R}$ and ${\text{\boldmath$ z $}}_{j}\in\mathbb{R}^{p}$ , $j=1,\dots,n$ , be continuous variables such that there is no exact collinearity.

(i)

For any $l\in\mathbb{N}$ , there exist ${\delta},{\varepsilon},M>0$ such that for all ${\omega}\geq M$ and all ${\text{\boldmath$ \beta $}}\in\mathbb{R}^{p}$ , the condition that

–

there exist distinct indices $i_{1},\dots,i_{l}=1,\dots,m$ such that $|a_{i_{1}}+b_{i_{1}}{\omega}-{{\text{\boldmath$ x $}}_{i_{1}}}^{\top}{\text{\boldmath$ \beta $}}|,\dots,|a_{i_{l}}+b_{i_{l}}{\omega}-{{\text{\boldmath$ x $}}_{i_{l}}}^{\top}{\text{\boldmath$ \beta $}}|\leq{\varepsilon}{\omega}$

implies that

–

there exist distinct indices $k_{1},\dots,k_{l}=1,\dots,p$ such that $|{\beta}_{k_{1}}|,\dots,|{\beta}_{k_{l}}|\geq{\delta}{\omega}$ .

(ii)

Let $c_{j}=0$ and ${\text{\boldmath$ z $}}_{j}={\text{\boldmath$ e $}}_{-j}^{(p)}$ for $j=-1,\dots,-p$ . Let ${\widehat{J}}=\{-1,\dots,-p\}\cup\{1,\dots,n\}$ . Let $\eta>0$ be arbitrary. Then for any $1\leq l\leq p$ , there exist ${\varepsilon},M>0$ such that for all ${\omega}\geq M$ and all ${\text{\boldmath$ \beta $}}\in\mathbb{R}^{p}$ , the condition that

–

there exist distinct indices $i_{1},\dots,i_{l}=1,\dots,m$ such that $|a_{i_{1}}+b_{i_{1}}{\omega}-{{\text{\boldmath$ x $}}_{i_{1}}}^{\top}{\text{\boldmath$ \beta $}}|,\dots,|a_{i_{l}}+b_{i_{l}}{\omega}-{{\text{\boldmath$ x $}}_{i_{l}}}^{\top}{\text{\boldmath$ \beta $}}|\leq{\varepsilon}{\omega}$

implies that

–

the set of indices $\{j\in{\widehat{J}}||c_{j}-{{\text{\boldmath$ z $}}_{j}}^{\top}{\text{\boldmath$ \beta $}}|\leq\eta\}$ has at most $p-l$ elements.

(iii)

Let $\eta>0$ and $\overline{{\varepsilon}}>0$ be arbitrary. Then there exist ${\delta}>0$ , $0<{\varepsilon}<\overline{{\varepsilon}}$ , and $M>0$ such that for all ${\omega}\geq M$ ,

[TABLE]

Proof. For part (i), note that for all ${\varepsilon}>0$ , ${\omega}>0$ , and ${\text{\boldmath$ \beta $}}\in\mathbb{R}^{p}$ and all $i=1,\dots,m$ , we have that

[TABLE]

In particular, $|a_{i}+b_{i}{\omega}-{{\text{\boldmath$ x $}}_{i}}^{\top}{\text{\boldmath$ \beta $}}|\leq{\varepsilon}{\omega}$ implies

[TABLE]

for some $s\in[\pm{\varepsilon}]$ . Therefore, part (i) is trivial if $l=1$ . Let $1\leq l^{\prime}\leq p$ be arbitrary and assume that part (i) holds for $l=l^{\prime}$ and let ${{\delta}}^{\prime},{{\varepsilon}}^{\prime},M^{\prime}>0$ be such that for all ${\omega}\geq M^{\prime}$ and all ${\text{\boldmath$ \beta $}}\in\mathbb{R}^{p}$ , the condition that there exist distinct indices $i_{1},\dots,i_{l^{\prime}}=1,\dots,m$ such that $|a_{i_{1}}+b_{i_{1}}{\omega}-{{\text{\boldmath$ x $}}_{i_{1}}}^{\top}{\text{\boldmath$ \beta $}}|,\dots,|a_{i_{l^{\prime}}}+b_{i_{l^{\prime}}}{\omega}-{{\text{\boldmath$ x $}}_{i_{l^{\prime}}}}^{\top}{\text{\boldmath$ \beta $}}|\leq{{\varepsilon}}^{\prime}{\omega}$ implies that there exist distinct indices $k_{1},\dots,k_{l^{\prime}}=1,\dots,p$ such that $|{\beta}_{k_{1}}|,\dots,|{\beta}_{k_{l^{\prime}}}|\geq{{\delta}}^{\prime}{\omega}$ . Let ${\cal I}=\{(i_{1},\dots,i_{l^{\prime}+1})^{\top}\in\{1,\dots,m\}^{l^{\prime}+1}|\text{$ i_{1},\dots,i_{l^{\prime}+1} $are distinct}\}$ and ${\cal K}=\{(k_{1},\dots,k_{l^{\prime}})^{\top}\in\{1,\dots,p\}^{l^{\prime}+1}|\text{$ k_{1},\dots,k_{l^{\prime}} $are distinct}\}$ . For ${\text{\boldmath$ i $}}=(i_{1},\dots,i_{l^{\prime}+1})^{\top}\in{\cal I}$ and ${\text{\boldmath$ k $}}=(k_{1},\dots,k_{l^{\prime}})^{\top}\in{\cal K}$ , let ${\text{\boldmath$ a $}}({\text{\boldmath$ i $}})=(a_{i_{1}},\dots,a_{i_{l^{\prime}+1}})^{\top}$ and ${\text{\boldmath$ b $}}({\text{\boldmath$ i $}})=(b_{i_{1}},\dots,b_{i_{l^{\prime}+1}})^{\top}$ , let $1\leq{\tilde{k}}_{1}({\text{\boldmath$ k $}})<\dots<{\tilde{k}}_{p-l^{\prime}}({\text{\boldmath$ k $}})\leq p$ be such that $\{{\tilde{k}}_{1}({\text{\boldmath$ k $}}),\dots,{\tilde{k}}_{p-l^{\prime}}({\text{\boldmath$ k $}})\}=\{1,\dots,p\}\setminus\{k_{1},\dots,k_{l^{\prime}}\}$ , and let

[TABLE]

Let

[TABLE]

Fix ${\omega}\geq M$ and ${\text{\boldmath$ \beta $}}\in\mathbb{R}^{p}$ . Suppose that there exist distinct indices $i_{1},\dots,i_{l^{\prime}+1}=1,\dots,m$ such that $|a_{i_{1}}+b_{i_{1}}{\omega}-{{\text{\boldmath$ x $}}_{i_{1}}}^{\top}{\text{\boldmath$ \beta $}}|,\dots,|a_{i_{l^{\prime}+1}}+b_{i_{l^{\prime}+1}}{\omega}-{{\text{\boldmath$ x $}}_{i_{l^{\prime}+1}}}^{\top}{\text{\boldmath$ \beta $}}|\leq{\varepsilon}{\omega}$ . Then, by assumption, there exist distinct indices $k_{1},\dots,k_{l^{\prime}}=1,\dots,p$ such that $|{\beta}_{k_{1}}|,\dots,|{\beta}_{k_{l^{\prime}}}|\geq{{\delta}}^{\prime}{\omega}$ . Let $1\leq{\tilde{k}}_{1}<\dots<{\tilde{k}}_{p-l^{\prime}}\leq p$ be such that $\{{\tilde{k}}_{1},\dots,{\tilde{k}}_{p-l^{\prime}}\}=\{1,\dots,p\}\setminus\{k_{1},\dots,k_{l^{\prime}}\}$ and let ${\text{\boldmath$ E $}}=({\text{\boldmath$ e $}}_{k_{1}}^{(p)},\dots,{\text{\boldmath$ e $}}_{k_{l^{\prime}}}^{(p)},{\text{\boldmath$ e $}}_{{\tilde{k}}_{1}}^{(p)},\dots,{\text{\boldmath$ e $}}_{{\tilde{k}}_{p-l^{\prime}}}^{(p)})^{\top}$ . Then, since

[TABLE]

for some ${\text{\boldmath$ s $}}\in[\pm{\varepsilon}]^{l^{\prime}+1}$ , where

[TABLE]

and where ${\text{\boldmath$ a $}}=(a_{i_{1}},\dots,a_{i_{l^{\prime}+1}})^{\top}$ and ${\text{\boldmath$ b $}}=(b_{i_{1}},\dots,b_{i_{l^{\prime}+1}})^{\top}$ , we have

[TABLE]

Therefore, $p\geq l^{\prime}+1$ ; otherwise,

[TABLE]

Thus,

[TABLE]

from which it follows that

[TABLE]

and hence that

[TABLE]

This proves part (i).

For part (ii), fix ${\varepsilon},M>0$ . Fix ${\omega}\geq M$ and ${\text{\boldmath$ \beta $}}\in\mathbb{R}^{p}$ . Suppose that there exist distinct indices $i_{1},\dots,i_{l}=1,\dots,m$ such that $|a_{i_{1}}+b_{i_{1}}{\omega}-{{\text{\boldmath$ x $}}_{i_{1}}}^{\top}{\text{\boldmath$ \beta $}}|,\dots,|a_{i_{l}}+b_{i_{l}}{\omega}-{{\text{\boldmath$ x $}}_{i_{l}}}^{\top}{\text{\boldmath$ \beta $}}|\leq{\varepsilon}{\omega}$ . Suppose further that there exist distinct indices $j_{1},\dots,j_{p-l+1}\in{\widehat{J}}$ such that $|c_{j_{1}}-{{\text{\boldmath$ z $}}_{j_{1}}}^{\top}{\text{\boldmath$ \beta $}}|,\dots,|c_{j_{p-l+1}}-{{\text{\boldmath$ z $}}_{j_{p-l+1}}}^{\top}{\text{\boldmath$ \beta $}}|\leq\eta$ . Then

[TABLE]

for some ${\text{\boldmath$ s $}}\in[\pm{\varepsilon}]^{l}$ and ${\text{\boldmath$ t $}}\in[\pm\eta]^{p-l}$ and $t\in[\pm\eta]$ , where

[TABLE]

and where ${\text{\boldmath$ a $}}=(a_{i_{1}},\dots,a_{i_{l}})^{\top}$ , ${\text{\boldmath$ b $}}=(b_{i_{1}},\dots,b_{i_{l}})^{\top}$ , ${\text{\boldmath$ c $}}=(c_{j_{1}},\dots,c_{j_{p-l}})^{\top}$ , and $c=c_{j_{p-l+1}}$ . Therefore,

[TABLE]

Thus,

[TABLE]

which is a contradiction if ${\varepsilon}>0$ is sufficiently small and $M>0$ is sufficiently large. This proves part (ii).

Part (iii) follows from parts (i) and (ii). This completes the proof. $\Box$

Proof of Theorem 1

Here, we prove Theorem 1.

Proof of Theorem 1. The posterior is

[TABLE]

where

[TABLE]

Since

[TABLE]

it is sufficient to show that

[TABLE]

Since for all ${\varepsilon}>0$ and all $i\in{\cal L}$ , $|y_{i}-{{\text{\boldmath$ x $}}_{i}}^{\top}{\text{\boldmath$ \beta $}}|\geq{\varepsilon}{\omega}$ and $|y_{i}|\geq 1$ imply

[TABLE]

for some $M_{1}>0$ , it follows from the dominated convergence theorem that

[TABLE]

for all $0<{\varepsilon}<\min_{i\in{\cal L}}|b_{i}|/2$ . Thus, since

[TABLE]

it suffices to prove that for all $\widetilde{{\cal K}}\subset{\cal K}$ and all $\widetilde{{\cal L}}\subset{\cal L}$ , there exists $0<{\varepsilon}<\min_{i\in{\cal L}}|b_{i}|/2$ such that

[TABLE]

where

[TABLE]

converges to [math] as ${\omega}\to\infty$ . This clearly holds for $\widetilde{{\cal L}}=\emptyset$ for all $\widetilde{{\cal K}}\subset{\cal K}$ .

First, fix $\emptyset\neq\widetilde{{\cal L}}\subset{\cal L}$ and let $\widetilde{{\cal K}}=\emptyset$ . Then, by part (i) of Lemma 1, there exist ${\delta}>0$ , $0<{\varepsilon}<\min_{i\in{\cal L}}|b_{i}|/2$ , and $M>0$ such that for all ${\omega}\geq M$ ,

[TABLE]

Since

[TABLE]

for all ${\omega}>0$ , clearly

[TABLE]

by the dominated convergence theorem. Fix $1\leq l\leq\min\{|\widetilde{{\cal L}}|,p\}$ , $i_{1},\dots,i_{l}\in\widetilde{{\cal L}}$ with $i_{1}<\dots<i_{l}$ , and $1\leq k_{1}<\dots<k_{l}\leq p$ and let

[TABLE]

Then

[TABLE]

for some $M_{2},M_{3}>0$ for any ${\alpha}<{{\alpha}}^{\prime}\leq\nu$ and therefore

[TABLE]

for some ${\alpha}<{{\alpha}}^{\prime}\leq\nu$ .

Next, fix $\emptyset\neq\widetilde{{\cal L}}\subset{\cal L}$ and $\emptyset\neq\widetilde{{\cal K}}\subset{\cal K}$ . Let $\widehat{{\cal K}}=\{-1,\dots,-p\}\cup\widetilde{{\cal K}}$ . Let $y_{j}=0$ and ${\text{\boldmath$ x $}}_{j}={\text{\boldmath$ e $}}_{-j}^{(p)}$ for $j=-1,\dots,-p$ . Then, by part (iii) of Lemma 1, there exist ${\delta}>0$ , $0<{\varepsilon}<\min_{i\in{\cal L}}|b_{i}|/2$ , and $M>0$ such that for all ${\omega}\geq M$ ,

[TABLE]

Clearly,

[TABLE]

Fix $1\leq l\leq\min\{|\widetilde{{\cal L}}|,p\}$ , $i_{1},\dots,i_{l}\in\widetilde{{\cal L}}$ with $i_{1}<\dots<i_{l}$ , and $1\leq k_{1}<\dots<k_{l}\leq p$ . Let

[TABLE]

As in the previous case, for some ${\alpha}<{{\alpha}}^{\prime}\leq\nu$ that is sufficiently close to ${\alpha}$ ,

[TABLE]

for some $M_{4}>0$ . Therefore,

[TABLE]

as ${\omega}\to\infty$ . Now, suppose that $p\geq l+1$ and fix $1\leq q\leq p-l$ and $j_{1},\dots,j_{q}\in\widehat{{\cal K}}$ with $j_{1}<\dots<j_{q}$ . Then if ${\omega}>1/{\delta}$ ,

[TABLE]

where the equality follows since there is no point ${\widetilde{\text{\boldmath$ \beta $}}}=({\tilde{\beta}}_{k})_{k=1}^{p}\in\mathbb{R}^{p}$ satisfying $|{\tilde{\beta}}_{-j}|\geq{\delta}{\omega}$ and $|y_{j}-{{\text{\boldmath$ x $}}_{j}}^{\top}{\widetilde{\text{\boldmath$ \beta $}}}|\leq 1$ for some $j=-1,\dots,-p$ . The right-hand side converges to [math] as ${\omega}\to\infty$ regardless of whether $\{-k_{1},\dots,-k_{l}\}\cap\{j_{1},\dots,j_{q}\}=\emptyset$ or not. This completes the proof. $\Box$

Proof of Theorem 2

Here, we prove Theorem 2.

Proof of Theorem 2. As in the proof of Theorem 1, we have

[TABLE]

where

[TABLE]

and

[TABLE]

Now, if ${\omega}$ is sufficiently large such that $|y_{i}|\leq 2|b_{i}|{\omega}$ for all $i\in{\cal L}$ , then

[TABLE]

Therefore, by making the change of variables ${\sigma}={\omega}s$ , we obtain

[TABLE]

This completes the proof. $\Box$

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Andrade and O’Hagan (2006) Andrade, J. A. A. and A. O’Hagan (2006). Bayesian robustness modeling using regularly varying distributions. Bayesian Analysis 1 (1), 169–188.
2Andrade and O’Hagan (2011) Andrade, J. A. A. and A. O’Hagan (2011). Bayesian robustness modelling of location and scale parameters. Scandinavian Journal of Statistics 38 (4), 691–711.
3Box and Tiao (1968) Box, G. E. and G. C. Tiao (1968). A bayesian approach to some outlier problems. Biometrika 55 (1), 119–129.
4Carvalho et al. (2009) Carvalho, C. M., N. G. Polson, and J. G. Scott (2009). Handling sparsity via the horseshoe. In AISTATS , Volume 5, pp. 73–80.
5Carvalho et al. (2010) Carvalho, C. M., N. G. Polson, and J. G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97 (2), 465–480.
6Cormann and Reiss (2009) Cormann, U. and R.-D. Reiss (2009). Generalizing the pareto to the log-pareto model and statistical inference. Extremes 12 (1), 93–105.
7Desgagné (2013) Desgagné, A. (2013). Full robustness in bayesian modelling of a scale parameter. Bayesian Analysis 8 , 187–220.
8Desgagné (2015) Desgagné, A. (2015). Robustness to outliers in location–scale parameter model using log-regularly varying distributions. The Annals of Statistics 43 (4), 1568–1595.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Posterior Robustness with Milder Conditions: Contamination Models Revisited

Abstract

Introduction

Contamination Models and Posterior Robustness

Theorem 1**.**

Theorem 2**.**

Numerical Examples

Acknowledgments

A Basic Lemma

Lemma 1**.**

Proof of Theorem 1

Proof of Theorem 2

Theorem 1.

Theorem 2.

Lemma 1.