Robust inference for threshold regression models

Javier Hidalgo; Jungyoon Lee; Myung Hwan Seo

arXiv:1702.00836·math.ST·January 15, 2020

Robust inference for threshold regression models

Javier Hidalgo, Jungyoon Lee, Myung Hwan Seo

PDF

TL;DR

This paper develops robust inference methods for threshold regression models that are valid whether the threshold point has a kink or a jump, addressing irregularities in the likelihood function and providing practical confidence interval procedures.

Contribution

It introduces a unified inference framework for threshold models that works under both continuity and discontinuity assumptions, handling irregularities in the likelihood Hessian.

Findings

01

The proposed method achieves correct coverage probabilities in simulations.

02

The scale parameter can be consistently estimated using a kernel method.

03

Bootstrap test inversion provides reliable confidence intervals in finite samples.

Abstract

This paper is concerned with inference in threshold regression models when the practitioners do not know whether at the threshold point the true specification has a kink or a jump. We nest previous works that assume either continuity or discontinuity at the threshold point and develop robust inference methods on the parameters of the model, which are valid under both specifications. In particular, we found that the parameter values under the kink restriction are irregular points of the Hessian matrix of the expected Gaussian quasi-likelihood. This irregularity destroys the asymptotic normality and induces the non-standard cube root convergence rate for the threshold estimate. However, it also enables us to obtain the same asymptotic distribution as in Hansen (2000) for the quasi-likelihood ratio statistic for the unknown threshold up to an unknown scale parameter. We show that this…

Figures4

Click any figure to enlarge with its caption.

Tables4

Table 1. Table 1: Monte Carlo size of test H 0 : γ = γ 0 : subscript 𝐻 0 𝛾 subscript 𝛾 0 H_{0}:\gamma=\gamma_{0} and coverage probability of confidence intervals of γ 0 subscript 𝛾 0 \gamma_{0} , model A: q t ≠ x t subscript 𝑞 𝑡 subscript 𝑥 𝑡 q_{t}\neq x_{t} , δ = n − φ 10 / 4 𝛿 superscript 𝑛 𝜑 10 4 \delta=n^{-\varphi}\sqrt{10}/4

		Size				Coverage Probability
		$γ_{0}$	median of $q_{t}$ (2)			$γ_{0}$	median of $q_{t}$ (2)			third quart. of $q_{t}$ (2.674)
$φ$		$s$ \ $n$	100	250	500	$ζ$ \ $n$	100	250	500	100	250	500
1/4	Asym	0.01	0.095	0.059	0.044	0.9	0.733	0.770	0.774	0.811	0.834	0.844
		0.05	0.195	0.153	0.130	0.95	0.818	0.832	0.857	0.870	0.895	0.914
		0.1	0.290	0.242	0.200	0.99	0.916	0.938	0.950	0.953	0.971	0.980
	B/rap	0.01	0.003	0.015	0.009	0.9	0.756	0.810	0.840	0.783	0.826	0.852
		0.05	0.052	0.055	0.037	0.95	0.833	0.880	0.910	0.859	0.892	0.915
		0.1	0.106	0.095	0.083	0.99	0.928	0.959	0.969	0.935	0.965	0.980
1/8	Asym	0.01	0.068	0.037	0.029	0.9	0.79	0.837	0.897	0.817	0.835	0.872
		0.05	0.164	0.092	0.077	0.95	0.856	0.898	0.923	0.873	0.91	0.914
		0.1	0.214	0.15	0.129	0.99	0.933	0.961	0.975	0.949	0.964	0.972
	B/rap	0.01	0.006	0.009	0.008	0.9	0.791	0.846	0.881	0.792	0.827	0.871
		0.05	0.046	0.052	0.049	0.95	0.858	0.907	0.93	0.859	0.9	0.917
		0.1	0.099	0.095	0.105	0.99	0.936	0.968	0.98	0.938	0.963	0.972

Table 2. Table 2: Monte Carlo size of test H 0 : γ = γ 0 : subscript 𝐻 0 𝛾 subscript 𝛾 0 H_{0}:\gamma=\gamma_{0} and coverage probability of confidence intervals of γ 0 subscript 𝛾 0 \gamma_{0} , model B: q t = x t subscript 𝑞 𝑡 subscript 𝑥 𝑡 q_{t}=x_{t} , δ = n − φ 10 / 4 𝛿 superscript 𝑛 𝜑 10 4 \delta=n^{-\varphi}\sqrt{10}/4

		Size				Coverage Probability
		$γ_{0}$	median of $q_{t}$ (2)			$γ_{0}$	median of $q_{t}$ (2)			third quart. of $q_{t}$ (2.674)
$φ$		$s$ \ $n$	100	250	500	$ζ$ \ $n$	100	250	500	100	250	500
1/4	Asym	0.01	0.185	0.145	0.155	0.9	0.608	0.612	0.658	0.740	0.730	0.725
		0.05	0.344	0.293	0.268	0.95	0.687	0.707	0.742	0.813	0.817	0.827
		0.1	0.437	0.379	0.365	0.99	0.831	0.851	0.859	0.905	0.924	0.926
	B/rap	0.01	0.022	0.013	0.021	0.9	0.770	0.836	0.866	0.868	0.882	0.878
		0.05	0.101	0.066	0.071	0.95	0.853	0.894	0.924	0.932	0.943	0.943
		0.1	0.203	0.126	0.133	0.99	0.946	0.972	0.982	0.975	0.984	0.980
1/8	Asym	0.01	0.155	0.098	0.079	0.9	0.661	0.72	0.786	0.771	0.779	0.791
		0.05	0.285	0.207	0.158	0.95	0.745	0.802	0.852	0.852	0.844	0.855
		0.1	0.368	0.275	0.224	0.99	0.86	0.886	0.921	0.925	0.941	0.938
	B/rap	0.01	0.029	0.009	0.017	0.9	0.797	0.871	0.904	0.886	0.891	0.888
		0.05	0.093	0.073	0.065	0.95	0.878	0.917	0.945	0.936	0.946	0.943
		0.1	0.171	0.113	0.109	0.99	0.95	0.981	0.99	0.984	0.984	0.98

Table 3. Table 3: Monte Carlo size of test H 0 : γ = γ 0 : subscript 𝐻 0 𝛾 subscript 𝛾 0 H_{0}:\gamma=\gamma_{0} and coverage probability of confidence intervals of γ 0 subscript 𝛾 0 \gamma_{0} , model C, kink, δ = 2 𝛿 2 \delta=2

		Size				Coverage Probability
		$γ_{0}$	median of $q_{t}$			$γ_{0}$	median of $q_{t}$			third quart. of $q_{t}$
		$s$ \ $n$	100	250	500	$ζ$ \ $n$	100	250	500	100	250	500
C	Asym	0.01	0.123	0.028	0.005	0.9	0.802	0.946	0.975	0.749	0.925	0.972
		0.05	0.168	0.043	0.015	0.95	0.84	0.965	0.983	0.784	0.945	0.98
		0.1	0.200	0.056	0.024	0.99	0.892	0.982	0.992	0.852	0.966	0.99
	B/rap	0.01	0.027	0.014	0.012	0.9	0.768	0.854	0.805	0.828	0.894	0.877
		0.05	0.091	0.054	0.052	0.95	0.817	0.918	0.889	0.88	0.949	0.943
		0.1	0.153	0.108	0.104	0.99	0.905	0.979	0.975	0.954	0.981	0.984

Table 4. Table 4: Monte Carlo size of test H 0 : γ = γ 0 : subscript 𝐻 0 𝛾 subscript 𝛾 0 H_{0}:\gamma=\gamma_{0} and coverage probability of confidence intervals of γ 0 subscript 𝛾 0 \gamma_{0} , model A: q t ≠ x t subscript 𝑞 𝑡 subscript 𝑥 𝑡 q_{t}\neq x_{t} , homoscedastic error, φ = 0 𝜑 0 \varphi=0

		Size				Coverage Probability
		$γ_{0}$	median of $q_{t}$ (2)			$γ_{0}$	median of $q_{t}$ (2)			third quart. of $q_{t}$ (2.674)
$δ$		$s$ \ $n$	100	250	500	$ζ$ \ $n$	100	250	500	100	250	500
$\sqrt{10} / 4$	Asym	0.01	0.0033	0.0032	0.002	0.9	0.969	0.976	0.971	0.969	0.979	0.975
(=0.7906)		0.05	0.0133	0.0109	0.0093	0.95	0.987	0.988	0.987	0.98	0.991	0.986
		0.1	0.0266	0.0219	0.0203	0.99	0.999	0.998	0.998	0.998	0.999	0.997
	B/rap	0.01	0.0104	0.0173	0.0114	0.9	0.837	0.859	0.836	0.839	0.848	0.843
		0.05	0.0691	0.0713	0.0674	0.95	0.87	0.901	0.868	0.87	0.883	0.875
		0.1	0.1353	0.1358	0.1276	0.99	0.935	0.936	0.925	0.926	0.933	0.928
0.25	Asym	0.01	0.016	0.0074	0.0075	0.9	0.88	0.909	0.93	0.879	0.925	0.931
		0.05	0.0599	0.0402	0.0322	0.95	0.938	0.95	0.972	0.927	0.958	0.961
		0.1	0.1102	0.076	0.0648	0.99	0.985	0.992	0.993	0.982	0.994	0.984
	B/rap	0.01	0.0146	0.0075	0.0121	0.9	0.873	0.876	0.894	0.851	0.896	0.897
		0.05	0.0585	0.0518	0.0563	0.95	0.934	0.93	0.939	0.916	0.949	0.943
		0.1	0.1123	0.1024	0.1117	0.99	0.984	0.986	0.992	0.975	0.987	0.981

Equations369

y_{t} = β^{'} x_{t} + δ^{'} x_{t} 1 {q_{t} > γ} + ε_{t},

y_{t} = β^{'} x_{t} + δ^{'} x_{t} 1 {q_{t} > γ} + ε_{t},

x_{t} = (1, x_{t 2}^{'}, q_{t})^{'}; δ = (δ_{1}, δ_{2}^{'}, δ_{3})^{'},

x_{t} = (1, x_{t 2}^{'}, q_{t})^{'}; δ = (δ_{1}, δ_{2}^{'}, δ_{3})^{'},

y_{t}

y_{t}

θ = (α^{'}, γ)^{'} := θ \in Θ a r g min S_{n} (θ),

θ = (α^{'}, γ)^{'} := θ \in Θ a r g min S_{n} (θ),

S_{n} (θ) := \frac{1}{n} t = 1 \sum n (y_{t} - α^{'} x_{t} (γ))^{2},

S_{n} (θ) := \frac{1}{n} t = 1 \sum n (y_{t} - α^{'} x_{t} (γ))^{2},

S_{n} (γ) := \frac{1}{n} t = 1 \sum n (y_{t} - α^{'} (γ) x_{t} (γ))^{2},

S_{n} (γ) := \frac{1}{n} t = 1 \sum n (y_{t} - α^{'} (γ) x_{t} (γ))^{2},

α (γ) := α \in Λ a r g min \frac{1}{n} t = 1 \sum n (y_{t} - α^{'} x_{t} (γ))^{2}

α (γ) := α \in Λ a r g min \frac{1}{n} t = 1 \sum n (y_{t} - α^{'} x_{t} (γ))^{2}

γ := γ \in Γ_{n} a r g min S_{n} (γ) .

γ := γ \in Γ_{n} a r g min S_{n} (γ) .

θ = (α^{'}, γ)^{'} := θ \in Θ : δ_{1} + δ_{3} γ = 0; δ_{2} = 0 a r g min S_{n} (θ) .

θ = (α^{'}, γ)^{'} := θ \in Θ : δ_{1} + δ_{3} γ = 0; δ_{2} = 0 a r g min S_{n} (θ) .

δ_{10} + δ_{30} γ_{0} = 0; δ_{20} = 0 .

δ_{10} + δ_{30} γ_{0} = 0; δ_{20} = 0 .

y_{t} = x_{t}^{'} β_{0} + δ_{30} (q_{t} - γ_{0}) 1_{t} (γ_{0}) + ε_{t} .

y_{t} = x_{t}^{'} β_{0} + δ_{30} (q_{t} - γ_{0}) 1_{t} (γ_{0}) + ε_{t} .

\widehat{\alpha}-\alpha_{0}=O_{p}\big{(}n^{-1/2}\big{)}\ \ \ \text{and}\ \ \ \ \widehat{\gamma}-\gamma_{0}=O_{p}\big{(}n^{-1/3}\big{)}\text{.}

\widehat{\alpha}-\alpha_{0}=O_{p}\big{(}n^{-1/2}\big{)}\ \ \ \text{and}\ \ \ \ \widehat{\gamma}-\gamma_{0}=O_{p}\big{(}n^{-1/3}\big{)}\text{.}

n^{1/2} (α - α_{0}) ⟶ d N (0, M^{- 1} Ω M^{- 1})

n^{1/2} (α - α_{0}) ⟶ d N (0, M^{- 1} Ω M^{- 1})

\displaystyle n^{1/3}(\widehat{\gamma}-\gamma_{0})\overset{d}{\longrightarrow}\underset{g\in\mathbb{R}}{\mathop{\mathrm{a}rgmax}}\big{(}2\delta_{30}\sqrt{\frac{\sigma^{2}\left(\gamma_{0}\right)f\left(\gamma_{0}\right)}{3}}W\left(g^{3}\right)+\frac{\delta_{30}^{2}}{3}f\left(\gamma_{0}\right)\left|g\right|^{3}\big{)}\text{,}

S_{n} (θ) - S_{n} (θ_{0})

S_{n} (θ) - S_{n} (θ_{0})

E (δ_{30} q_{t} 1_{t} (0) - (δ_{1} + δ_{3} q_{t}) 1_{t} (γ))^{2}

E (δ_{30} q_{t} 1_{t} (0) - (δ_{1} + δ_{3} q_{t}) 1_{t} (γ))^{2}

E [q_{t}^{2} 1 {0 < q_{t} \leq γ}] = \int_{0}^{γ} q^{2} f (q) d q \sim \frac{c}{3} ∣ γ ∣^{3}

E [q_{t}^{2} 1 {0 < q_{t} \leq γ}] = \int_{0}^{γ} q^{2} f (q) d q \sim \frac{c}{3} ∣ γ ∣^{3}

v a r (\frac{1}{n} t = 1 \sum n ε_{t} (δ_{30} q_{t} 1_{t} (0) - (δ_{1} + δ_{3} q_{t}) 1_{t} (γ))) \sim \frac{∥ δ - δ _{0} ∥ ^{2} + ∣ γ ∣ ^{3}}{n} .

v a r (\frac{1}{n} t = 1 \sum n ε_{t} (δ_{30} q_{t} 1_{t} (0) - (δ_{1} + δ_{3} q_{t}) 1_{t} (γ))) \sim \frac{∥ δ - δ _{0} ∥ ^{2} + ∣ γ ∣ ^{3}}{n} .

δ - δ_{0} = O_{p} (n^{- 1/2}) and γ = O_{p} (n^{- 1/3}),

δ - δ_{0} = O_{p} (n^{- 1/2}) and γ = O_{p} (n^{- 1/3}),

S_{n} (θ) - S_{n} (θ_{0})

S_{n} (θ) - S_{n} (θ_{0})

E (δ_{30} q_{t} 1_{t} (0) - δ_{3} (q_{t} - γ) 1_{t} (γ))^{2}

E (δ_{30} q_{t} 1_{t} (0) - δ_{3} (q_{t} - γ) 1_{t} (γ))^{2}

v a r (\frac{2}{n} t = 1 \sum n ε_{t} (δ_{30} q_{t} 1_{t} (0) - δ_{3} (q_{t} - γ) 1_{t} (γ)))

δ_{3} - δ_{30} = O_{p} (n^{- 1/2}) and γ = O_{p} (n^{- 1/2}),

δ_{3} - δ_{30} = O_{p} (n^{- 1/2}) and γ = O_{p} (n^{- 1/2}),

M = \frac{1}{n} t = 1 \sum n x_{t} (γ) x_{t} (γ)^{'}; Ω = \frac{1}{n} t = 1 \sum n x_{t} (γ) x_{t} (γ)^{'} ε_{t}^{2},

M = \frac{1}{n} t = 1 \sum n x_{t} (γ) x_{t} (γ)^{'}; Ω = \frac{1}{n} t = 1 \sum n x_{t} (γ) x_{t} (γ)^{'} ε_{t}^{2},

Q L R_{n} = n \frac{S _{n} ( γ _{0} ) - S _{n} ( γ )}{S _{n} ( γ )},

Q L R_{n} = n \frac{S _{n} ( γ _{0} ) - S _{n} ( γ )}{S _{n} ( γ )},

Q L R_{n} ⟶ d ζ g \in R max (2 W (g) - ∣ g ∣),

Q L R_{n} ⟶ d ζ g \in R max (2 W (g) - ∣ g ∣),

ζ = \frac{σ ^{2} ( γ _{0} )}{σ ^{2}} .

ζ = \frac{σ ^{2} ( γ _{0} )}{σ ^{2}} .

Q L R_{n} ⟶ d ξ g \in R max (2 W (g) - ∣ g ∣),

Q L R_{n} ⟶ d ξ g \in R max (2 W (g) - ∣ g ∣),

\xi=\frac{E\big{(}\left(x_{t}^{\prime}d\varepsilon_{t}\right)^{2}|q_{t}=\gamma_{0}\big{)}}{\sigma^{2}E\big{(}\left(x_{t}^{\prime}d\right)^{2}|q_{t}=\gamma_{0}\big{)}},

\xi=\frac{E\big{(}\left(x_{t}^{\prime}d\varepsilon_{t}\right)^{2}|q_{t}=\gamma_{0}\big{)}}{\sigma^{2}E\big{(}\left(x_{t}^{\prime}d\right)^{2}|q_{t}=\gamma_{0}\big{)}},

\widehat{\xi}=\frac{\frac{1}{n}\sum_{t=1}^{n}\big{(}\widehat{\delta}^{\prime}x_{t}\big{)}^{2}\widehat{\varepsilon}_{t}^{2}K\left(\frac{q_{t}-\widehat{\gamma}}{a}\right)}{\mathbb{S}_{n}\big{(}\widehat{\theta}\big{)}\frac{1}{n}\sum_{t=1}^{n}\big{(}\widehat{\delta}^{\prime}x_{t}\big{)}^{2}K\left(\frac{q_{t}-\widehat{\gamma}}{a}\right)}\text{,}

\widehat{\xi}=\frac{\frac{1}{n}\sum_{t=1}^{n}\big{(}\widehat{\delta}^{\prime}x_{t}\big{)}^{2}\widehat{\varepsilon}_{t}^{2}K\left(\frac{q_{t}-\widehat{\gamma}}{a}\right)}{\mathbb{S}_{n}\big{(}\widehat{\theta}\big{)}\frac{1}{n}\sum_{t=1}^{n}\big{(}\widehat{\delta}^{\prime}x_{t}\big{)}^{2}K\left(\frac{q_{t}-\widehat{\gamma}}{a}\right)}\text{,}

ξ \to P ζ,

ξ \to P ζ,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Robust Inference for Threshold Regression Models††thanks: We thank anonymous referees and an Associate Editor for their constructive comments. M. Seo

gratefully acknowledges the support from Promising-Pioneering Researcher Program through Seoul National University (SNU) and from the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-0405-20180026).

Javier Hidalgo

London School of Economics

Jungyoon Lee

Royal Holloway, University London

Myung Hwan Seo

Seoul National University

Abstract

This paper is concerned with inference in threshold regression models when the practitioners do not know whether at the threshold point the true specification has a kink or a jump. We nest previous works that assume either continuity or discontinuity at the threshold point and develop robust inference methods on the parameters of the model, which are valid under both specifications. In particular, we found that the parameter values under the kink restriction are irregular points of the Hessian matrix of the expected Gaussian quasi-likelihood. This irregularity destroys the asymptotic normality and induces the nonstandard cube root convergence rate for the threshold estimate. However, it also enables us to obtain the same asymptotic distribution as in Hansen (2000) for the quasi-likelihood ratio statistic for the unknown threshold up to an unknown scale parameter. We show that this scale parameter can be consistently estimated by a kernel method as long as no higher order kernel is used. Furthermore, we propose to construct confidence intervals for the unknown threshold by bootstrap test inversion, also known as grid bootstrap. Finite sample performances of the grid bootstrap confidence intervals are examined through Monte Carlo simulations. We also implement our procedure to an economic empirical application.

JEL Classification: C12, C13, C24.

Key words: Change Point, Kink, Grid Bootstrap, Cube Root.

1 INTRODUCTION

This paper examines robust inference in threshold models without a priori knowledge on whether the model is or not continuous at the threshold point. Since its introduction, threshold models have gained a lot of attention in econometrics, statistics and other fields, see Tong (1990) and Hansen $\left(2000\right)$ among others. In the time series context, their popularity is due to the fact that they are capable to explain nonlinear features present in many data such as chaos, cycles, irreversibility among others. In addition they have proved to have superior forecast performance in times of recession, see Tiao and Tsay $\left(1994\right)$ .

We nest previous works that assume either continuity or discontinuity at the threshold point and develop robust inference methods on the parameters of the model, which are valid under both specifications When looking at inferences regarding these type of models, the literature has explicitly assumed that either the threshold regression model is continuous and kinked or it is discontinuous at the threshold point. For instance, Chan (1993) and Hansen $\left(2000\right)$ have focused on inference when the model is discontinuous at the threshold point, whereas Chan and Tsay $\left(1998\right),$ Hansen $\left(2017\right)$ and Feder (1975a) have focused on inference in kink models. However, there is no a priori reason to believe that the model is or it is not continuous. The main motivation to have a “unified” or robust inference theory for these models is that their statistical properties are very different whether one estimates the model under the restriction of continuity or not. In particular, the estimates of the parameters of the model are all square root $n$ -consistent and asymptotically normal when the model is estimated under the (true) assumption of continuity, but under discontinuity the least squares estimator of $\gamma$ is super consistent, asymptotically independent of the slope parameter estimates, and non-Gaussian. So, it is worthwhile to obtain some statistical properties of estimates of the parameters in a model that nests continuous and discontinuous frameworks.

We show an interesting property that the estimator of the threshold parameter fails to be root- $n$ consistent, contrary to what one might expect, if the model is continuous but the true restriction is not imposed in the estimation procedure. More specifically, we show that the rate of convergence of the estimate of the threshold point becomes $n^{1/3}$ in contrast to $n^{1/2}$ , which was first obtained by Feder $\left(1975a\right)$ and in the time series context by Chan and Tsay $\left(1998\right)$ by imposing the (true) constraint of a kink in its estimation. The asymptotic distribution of the threshold estimator is no longer normal but the “ $\mathop{\mathrm{a}rgmax}$ ” of some Gaussian process. On the other hand, we find that the unconstrained estimator of the slope parameters is asymptotically independent of the estimator of the threshold point, contrary to the findings in previous works. The asymptotic independence is also the case under the jump models of Chan $\left(1993\right)$ or Hansen $(2000)$ but not under the constrained estimation of Feder’s (1975) or Chan and Tsay’s (1998) kink models. This finding is interesting and new, when compared to standard results in regression models, where it is known that the consequence of not using the (true) restrictions is inefficiency but otherwise the asymptotic distribution is still Gaussian and the rate of convergence is the same. So, we conclude that the statistical inference for threshold regression models hinges too much on the unverified assumption of kink versus jump.

Our preceding discussion motivates us to develop a robust inference in the threshold regression model. To that end, we first show that a quasi-likelihood ratio statistic for the location of the threshold has the same asymptotic distribution up to a scale constant that depends on whether the true regression model has a kink or a jump. Second, we present an estimator for the scale factor based on the ratio of two kernel Nadaraya-Watson estimators. The consistency of this estimator is standard under the jump model but non-standard under the kink model because both its numerator and denominator converge to zero in probability. However, we prove that, similar to L’Hopital rule, the ratio of the two degenerating terms still converges in probability to the correct scale factor under the interesting requirement that higher-order kernels should not be used. Third, we show that the asymptotic distribution of the unconstrained estimator of the slope parameters when the model has a kink is identical to the one under the jump specification, which results from the asymptotic independence between the estimators of the slope and threshold parameters. This is not the case if the (correct) kink assumption were employed in the estimation of the parameters.

The last goal of this paper is to present valid bootstrap schemes for the construction of confidence sets for the threshold location. The motivation comes from the fact that sometimes the asymptotic critical values appear to be a poor approximation to the finite-sample ones, as documented by Hansen $(2000)$ and also in our Section 5 among others. In addition, the first-order validity of the bootstrap is of theoretical interest and it has not been established even under the Hansen’s $(2000)$ shrinking jump design. The interest stems from two sets of findings in the literature regarding the failure of bootstrap for non-standard estimators: firstly with cube-root estimators such as the maximum score estimator, and secondly with super-consistent estimators such as the estimator of autoregressive coefficients of unit root processes and the threshold estimator under Chan’s (1993) model, see Abrevaya and Huang $(2005)$ , Seijo and Sen $(2011)$ , and Yu $(2014)$ , just to name a few. Note that the unconstrained estimator of the threshold belongs to the cube-root class under the kink model and to the super-consistent class under the jump models. Unlike failures of bootstrap in the cases listed above, we show that the proposed bootstrap statistics, which build on the wild bootstrap, correctly approximate the sampling distribution of the scaled quasi-likelihood ratio statistic in our settings. This contrast is perhaps due to the fact that the nuisance parameter in the asymptotic distribution under the non-shrinking model is infinite-dimensional while the ones in our continuous and shrinking specifications are finite-dimensional scaling terms. Furthermore, we propose bootstrap test inversion confidence interval for the threshold, also known as the grid bootstrap in Hansen $(1999)$ , to enhance the finite-sample coverage probability.

We then present results of a small Monte Carlo experiment, which report good finite-sample performance of our bootstrap procedure for inference on the threshold location. In our empirical application, we apply our robust inferential method to the time series data on real GDP growth and debt-to-GDP ratio of a number of countries. Numerous works had fitted jump threshold models to a variety of of datasets, see e.g. Caner, Grennes, and Koehler-Geib $(2010)$ , Cecchetti, Mohanty, and Zampolli $(2011)$ , and Lee et al. $(2017)$ , while Hansen $(2017)$ had fitted kink threshold model to the US time series data. As there is little guidance from economic theory on suitability of jump or kink models, we advocate the use of our robust inference, and find substantial heterogeneity across countries in not just the estimated model parameters but also in the presence and location of threshold effect.

In Section 2 we introduce the model and present a set of regularity assumptions and describe how to estimate the parameters of the model. In particular, we examine the properties of the least squares estimator of the parameters when the model is continuous but we estimate them without this knowledge. In Section 3 we then develop robust inferential methods for model parameters that are valid under both continuous and discontinuous settings, despite the slower rate of convergence for the estimate of the threshold under the kink specification. We then present in Section 4 a bootstrap algorithm for inference on the model parameters, establishing their validity. Section 5 presents results of a small Monte Carlo study, followed by Section 6, which contains the empirical application. Section 7 concludes. This paper has an appendix that contains some of the proofs and an online supplement that presents the remaining proofs, technical lemmas, and more numerical results for Sections 5 and 6.

2 MODEL AND ESTIMATORS

We shall consider the following threshold regression model

[TABLE]

where $\mathbf{1}\left\{\cdot\right\}$ denotes the indicator function and $x_{t}$ is a $k$ -dimensional vector of regressors. The parameter $\gamma$ is referred to as a threshold point, taking values in a compact parameter space $\Gamma$ , which is a subset of the interior on the domain of the threshold variable $q_{t}$ . It is worth mentioning that all our results hold true also when $q_{t}=t$ , which is the case with structural break models. However, we have opted not to include this scenario for the sake of clarity and notational simplicity.

We assume that $q_{t}$ is an element of the regressor vector $x_{t}$ and denote

[TABLE]

where $\delta$ is partitioned to match the dimensionality of $x_{t}$ . Also we shall abbreviate $\mathbf{1}_{t}\left(\gamma\right)=\mathbf{1}\left\{q_{t}>\gamma\right\}$ and $x_{t}\left(\gamma\right)=\left(x_{t}^{\prime},x_{t}^{\prime}\mathbf{1}_{t}\left(\gamma\right)\right)^{\prime}$ , so that we can write $\left(\ref{eq:model}\right)$ as

[TABLE]

Before stating some regularity assumptions on the model, we need to introduce some extra notation. Let $f\left(\cdot\right)$ denote the density function of $q_{t}$ , which we assume to exist, and $\sigma^{2}\left(\gamma\right)=E\left(\varepsilon_{t}^{2}\mid q_{t}=\gamma\right)$ , the conditional variance function of error term, while $\sigma^{2}=E(\varepsilon_{t}^{2})$ denotes the unconditional variance. Denote $k\times k$ matrices $D\left(\gamma\right)=E\left(x_{t}x_{t}^{\prime}|q_{t}=\gamma\right)$ , $V\left(\gamma\right)=E\left(x_{t}x_{t}^{\prime}\varepsilon_{t}^{2}|q_{t}=\gamma\right)$ and let $D=D\left(\gamma_{0}\right)$ and $V=V\left(\gamma_{0}\right)$ . As usual the “[math]” subscript on a parameter indicates its true unknown value. Finally, let $M=E(\mathbf{x_{t}x_{t}^{\prime}})$ and $\Omega=E(\mathbf{x_{t}x_{t}^{\prime}}\varepsilon_{t}^{2})$ with $\mathbf{x_{t}}=x_{t}\left(\gamma_{0}\right)$ .

Assumption Z.

Let $\left\{x_{t},\varepsilon_{t}\right\}_{t\in\mathbb{Z}}$ be a strictly stationary, ergodic sequence of random variables such that their $\rho$ -mixing coefficients satisfy $\sum_{m=1}^{\infty}\rho_{m}^{1/2}<\infty$ and $E\left(\varepsilon_{t}|{{\mathcal{F}}}_{t-1}\right)=0$ , where ${{\mathcal{F}}}_{t}$ is the filtration up to time $t$ . Furthermore, $M,\Omega>0$ , $E\left\|x_{t}\right\|^{4}<\infty$ , $E\left\|x_{t}\varepsilon_{t}\right\|^{4}<\infty$ and $E\left|\varepsilon_{t}\right|^{4+\eta}<\infty$ for some $\eta>0$ .

Assumption Q.

The functions $f\left(\gamma\right)$ , $V\left(\gamma\right)\$ and $D\left(\gamma\right)$ are continuous at $\gamma=\gamma_{0}$ . For all $\gamma\in\Gamma$ , the functions $f\left(\gamma\right)$ , $E\big{(}x_{t}x_{t}^{\prime}\mathbf{1}\left\{q_{t}\leq\gamma\right\}\big{)}$ and $E\left(x_{t2}x_{t2}^{\prime}|q_{t}=\gamma\right)$ are positive and continuous, and the functions $f\left(\gamma\right)$ , $\ E\big{(}|x_{t}|^{4}|q_{t}=\gamma\big{)}$ and $E\big{(}|x_{t}\varepsilon_{t}|^{4}|q_{t}=\gamma\big{)}$ are bounded by some $C<\infty$ .

Assumptions **Z **and Q are commonly imposed on the distribution of $\left\{x_{t},\varepsilon_{t}\right\}$ , see e.g. Hansen $\left(2000\right)$ , so his comments apply here. As discussed therein, the self-exciting threshold autoregressive model of Tong $\left(1990\right)$ satisfies Assumption Z. The condition for $E\left(x_{t2}x_{t2}^{\prime}|q_{t}=\gamma\right)$ is written in terms of $x_{t2}$ as the other elements in $x_{t}$ are fixed given $q_{t}=\gamma$ . While we allow conditional heteroscedasticity of a general form, Assumption Q requires continuity of the conditional variance function $\sigma^{2}(\cdot)$ at $\gamma_{0}$ .

2.1 Estimators

We estimate $\theta_{0}=\left(\alpha_{0}^{\prime},\gamma_{0}\right)^{\prime}$ by the (non-linear) least squares estimator (LSE), that is,

[TABLE]

where $\Theta=\left(\Lambda,\Gamma\right)$ is a compact set in ${{\ \mathbb{R}}}^{2k+1}$ and

[TABLE]

which is a step function in $\gamma$ at $q_{t}$ ’s. For its computation, we shall employ a step-wise algorithm. To that end, one could employ the grid search algorithm on $\Gamma_{n}=\Gamma\cap\left\{q_{1},...,q_{n}\right\}$ to find $\widehat{\gamma}$ . Define the concentrated sum of squared residuals

[TABLE]

where

[TABLE]

is the LSE of $\alpha$ for a given $\gamma$ . Then, our estimator of $\alpha$ is $\widehat{\alpha}:=\widehat{\alpha}\left(\widehat{\gamma}\right)$ , with

[TABLE]

Since the minimizer is given by an interval, it is common to let the estimator be the maximum. This is the unconstrained LSE and for comparison we also describe the continuity constrained least squares estimator (CLSE), which minimizes $\left(\ref{s_theta}\right)$ under Assumption C in the next section,

[TABLE]

This estimator was considered by Feder $\left(1975a\right)$ and later by Chan and Tsay $(1998)$ or Hansen $(2017),$ who have established the asymptotic normality of $\widetilde{\theta}$ with the standard squared root consistency.

3 Robust Confidence Regions

This section presents our main results, namely how to perform robust inference in threshold models and in particular on the location of the threshold point. We begin with developing inference methods for the regression coefficients $\alpha_{0}$ and the unknown threshold $\gamma_{0}$ based on the LSE $\widehat{\theta}$ when the true regression model has a kink. Then, they are compared with other inference methods that are developed under different sampling schemes such as Hansen $\left(2000\right)$ . In particular, we show that a judicious choice of statistics enables us to perform a robust inference in the sense that the same critical values can be employed for inference whether the model has a kink or a jump. That is, we do not need to know whether the model has a kink or a jump to make inference for the parameters $\alpha_{0}$ and $\gamma_{0}$ . As mentioned in the introduction the motivation comes from the rather surprising results given in Proposition 1 and Theorem 1 below.

First we state the kink model in terms of assumption.

Assumption C.

Assume that $\delta_{30}\neq 0$ and

[TABLE]

Under Assumption C the model (2) is written as

[TABLE]

Feder (1975), Chan and Tsay (1998), and Hansen (2017) considered the estimation of the model (11) along with an auxiliary condition of $\delta_{30}\neq 0$ to ensure the identification of the change-point $\gamma_{0}$ . This is a model with a kink.

Then, the next proposition establishes the consistency and rates of convergence of the LSE $\widehat{\theta}$ defined in $\left(\ref{theta_hat}\right)$ under Assumption C.

Proposition 1.

Under Assumptions C, Z and Q, we have that

[TABLE]

The results of Proposition 1 are surprising because the convergence rate of $\widehat{\gamma}$ is slower than that of the CLSE $\widetilde{\gamma}$ , which is known to be $n^{-1/2}$ as shown in the aforementioned works. That is, using the true restriction on the parameters leads to a faster rate of convergence of the estimator of $\gamma_{0}$ , not just reducing its asymptotic variance as is often the case.

Next we present the asymptotic distribution of $\widehat{\theta}$ .

Theorem 1.

Let Assumptions C, Z and Q hold and $B_{1}\left(\cdot\right)$ and $B_{2}\left(\cdot\right)$ be two independent standard Brownian motions. Define $W\left(g\right):=B_{1}\left(-g\right)\mathbf{1}\left\{g<0\right\}+B_{2}\left(g\right)\mathbf{1}\left\{g>0\right\}$ . Then,

[TABLE]

where the two limit distributions are independent of each other.

The asymptotic independence is a consequence of the different convergence rates between the two sets of estimators $\widehat{\alpha}$ and $\widehat{\gamma}$ by similar arguments as in Chan $\left(1993\right)$ , albeit the rate for $\widehat{\gamma}$ being slower than that for $\widehat{\alpha}$ in our case. The asymptotic independence does not hold for the CLSE $\widetilde{\gamma}$ and $\widetilde{\alpha}$ , which converge at the same rate as mentioned above and they are jointly asymptotically normal with a non-diagonal variance covariance matrix.

Theorem 1 suggests that Gonzalo and Wolf’s $(2005)$ subsampling procedure would be correct if they had used the normalization $n^{1/3}$ instead of the incorrect one $n^{1/2}$ . On the other hand, it is worth mentioning that Seo and Linton $\left(2007\right)$ considered the smoothed least squares estimator for the same setup. The convergence rate for their smoothed least squares estimator for $\gamma$ was slower than our cube-root rate under their assumptions for the smoothing parameter.

Remark 1.

We now present a heuristic discussion to illustrate why the constrained and unconstrained estimators of $\gamma_{0}$ have different rates of convergence and the unconstrained estimator belongs to the cube-root class explored by Kim and Pollard $\left(1990\right)$ for the $i.i.d.$ data and Seo and Otsu $\left(2018\right)$ for more general setups. For simplicity of illustration, we begin with a simplified model, where $x_{t}=\left(1,q_{t}\right)^{\prime}$ , $\delta=\left(\delta_{1},\delta_{3}\right)^{\prime}$ , $\beta$ is fixed at $\beta_{0}=0$ , and thus $\theta=\left(\delta^{\prime},\gamma\right)^{\prime}$ . In addition we shall assume $\gamma_{0}=0$ and thus $\delta_{10}=0$ by (10) without loss of generality since we can always rename the variable $q_{t}-\gamma_{0}$ as $q_{t}$ . It is well known that the rates of convergence of an M-estimator is governed by the local behavior of its criterion function around the true value provided that the estimator is consistent. Then the convergence rate of LSE $\widehat{\theta}=\left(\widehat{\delta}^{\prime},\widehat{\gamma}\right)^{\prime}$ is determined by the stochastic expansion of

[TABLE]

in small neighborhoods of $\delta=\delta_{0}$ and $\gamma=\gamma_{0}=0$ . Consider $\gamma>0$ . The case of $\gamma<0$ is handled similarly. Then, as $\mathbf{1}_{t}\left(0\right)=\mathbf{1}_{t}\left(\gamma\right)+\mathbf{1}\left\{0<q_{t}\leq\gamma\right\}$ and $\mathbf{1}_{t}\left(\gamma\right)\mathbf{1}\left\{0<q_{t}\leq\gamma\right\}=0$ ,

[TABLE]

because for some positive constant $c$ ,

[TABLE]

due to Assumption Q. This cubic approximation at $\gamma=\gamma_{0}$ is non-standard and invalidates the asymptotic normality of $\widehat{\gamma}$ , which builds on the quadratic approximation.111This also shows that the asymptotic variance formula $U^{-1}VU^{-1}$ in Gonzalo and Wolf’s (2005) Theorem A.1 and Remark A.1 is not properly defined due to the degeneracy of $U$ , where $U$ is the second derivative matrix of the expected criterion function that is evaluated under the continuity restriction. Similarly,

[TABLE]

Thus, the last two displayed expressions suggest that

[TABLE]

as these rates of convergence balance the speeds at which the bias and standard deviation of ${{{\mathbb{S}}}}_{n}\left(\theta\right)-{{{\mathbb{S}}}}_{n}\left(\theta_{0}\right)$ converge to zero. In comparison, the CLSE $\left(\widetilde{\delta}_{3},\widetilde{\gamma}\right)^{\prime}$ is ruled by

[TABLE]

due to the continuity constraint (10), for which we observe the quadratic expansion

[TABLE]

This yields that

[TABLE]

which coincides with the rates of convergence that both Feder $(1975$ a, b) and Chan and Tsay $(1998)$ obtained.

An intuitive explanation for the preceding Proposition, Theorem, and Remark is to appeal to “misspecification”. Although the unconstrained model (1) encompasses both continuous and discontinuous models, the estimated regression function is almost surely discontinuous, since the probability that the LSE $\widehat{\theta}$ fulfills the continuity restriction is zero.

3.1 Inference on Regression Coefficient $\alpha$

Theorem 1 in Section 3.1, Lemma A.12 of Hansen $\left(2000\right)$ and Theorem 2 of Chan $\left(1993\right)$ report the same asymptotic distribution for $\widehat{\alpha},$ namely $\mathcal{N}\left(0,M^{-1}\Omega M^{-1}\right)$ , which is asymptotically independent of $\widehat{\gamma}$ . Thus, the inference for $\alpha_{0}$ is uniform under any widely used sampling scheme with strongly identified $\gamma_{0}$ , provided that the respective sample moments

[TABLE]

where $\widehat{\varepsilon}_{t}=y_{t}-x_{t}\left(\widehat{\gamma}\right)^{\prime}\widehat{\alpha}$ , are consistent under each data generating process. This is the case due to the uniform law of large numbers, which only requires consistency of $\widehat{\gamma}$ .

It is worthwhile to mention that this “oracle” property of $\widehat{\alpha}$ does not hold true for the CLSE $\widetilde{\alpha}$ , whose asymptotic distribution is affected by that of $\widetilde{\gamma}$ , as was first noticed and shown by Feder $(1975a)$ and later extended to time series data by Chan and Tsay $(1998)$ .

3.2 Inference on Threshold $\gamma$

The main purpose of this section is to develop a method to construct confidence regions for $\gamma_{0}$ that is valid regardless of whether the regression model has a kink or a jump at the true value of $\gamma_{0}$ . Conventionally, inference on $\gamma$ has been done after assuming either that the model has a kink or that it has a jump, i.e. the practitioner chooses between jump or kink models before estimating the threshold point. More specifically, if we decide that the model has a jump, then one follows e.g. Hansen $\left(2000\right)$ , whereas if one has chosen the kink model then one needs to employ the asymptotic normal inference as in Feder $\left(1975a\right)$ and others. One of our findings is that Hansen $\left(2000\right)$ results are not valid if the model had a kink and likewise Feder’s results are not valid if the model had a jump.

Thus, this section develops robust confidence regions that are valid regardless which of the two models is the true specification. To ease reference, we recall Hansen’s (2000) diminishing jump specification:

Assumption J.

For some $0<\varphi<1/2$ and $d\neq 0$ , $\delta_{0}=d\cdot n^{-\varphi}$ and $d^{\prime}Vd\$ and $d^{\prime}Dd$ are positive for all $n$ .

When $\varphi$ is greater than or equal to $1/2$ , $\delta_{0}$ is too small to consistently estimate $\gamma_{0}$ , and such case is excluded. And we suppress the dependence of $\delta_{0}$ on the sample size $n$ to simplify the notation.

To develop robust confidence sets, we need to find a statistic whose asymptotic distribution is invariant to the true parameter value, that is, a statistic whose asymptotic distribution does not change suddenly under Assumption C. We begin by introducing a Gaussian quasi-likelihood ratio statistic based on the unconstrained model $\left(\ref{eq:model}\right)$ . Specifically, let

[TABLE]

where $\widehat{\mathbb{S}}_{n}\left(\gamma\right)$ is defined in $\left(\ref{ssngm}\right)$ .

We now derive the following asymptotic distribution for $QLR_{n}$ , which contrasts with the asymptotic distribution obtained by Hansen $(2000)$ under Assumption J.

Proposition 2.

Suppose that Assumptions C, Z and Q hold. Then, as $n\rightarrow\infty$ ,

[TABLE]

where

[TABLE]

In comparison, we recall Hansen’s (2000) results that

[TABLE]

where

[TABLE]

and that the distribution function of $\max_{g\in\mathbb{R}}\left(2W\left(g\right)-\left|g\right|\right)$ is given by $F\left(z\right)=\left(1-e^{-z/2}\right)^{2}$ .

The results of our Proposition 2 and that in $\left(\ref{hansen_1}\right)$ indicate that the only difference between the limit distributions of $QLR_{n}$ under the kink and jump specifications is the scaling factor. This is the case despite the fact the estimator $\widehat{\gamma}$ exhibits different rates of convergence across the two settings.

Next, we propose an estimator of the unknown scaling of $QLR_{n}$ that converges in probability to $\xi$ under Assumption J, while it converges to $\zeta$ under Assumption C, thus adapting to the unknown true scaling in each situation. We begin with a natural estimator of $\xi$ , which is a ratio of two Nadaraya-Watson estimators of the conditional expectations. That is,

[TABLE]

where $K\left(\cdot\right)$ and $a$ are, respectively, the kernel function and bandwidth parameter and $\widehat{\varepsilon}_{t}$ ’s are the least squares residuals. The consistency of $\widehat{\xi}$ to $\xi$ is standard, as argued in Hansen $(2000)$ .

However, it is not trivial to establish that $\widehat{\xi}\overset{p}{\longrightarrow}\zeta$ when the true model has a kink at $\gamma_{0}$ because both numerator and denominator degenerates asymptotically in Assumption C. It turns out that we need to impose some unconventional restrictions on the kernel function $K$ and the bandwidth $a$ . Specifically, we assume

Assumption K.

Assume the following for $K\left(\cdot\right)$ and $a.$

$\mathbf{K1}$

$K\left(\cdot\right)$ is symmetric and $\kappa_{\ell}=\int_{-\infty}^{\infty}u^{\ell}K\left(u\right)du<C$ for $\ell\leq 4$ and $\kappa_{2}\neq 0$ .

$\mathbf{K2}$

$K\left(\cdot\right)$ is twice continuously differentiable with the first derivative $K^{\prime}\left(\cdot\right)$ and for all $u\$ such that $\left|w/u\right|\leq C$ as $w\rightarrow 0$ $K^{\prime}\left(u+w\right)/K^{\prime}\left(u\right)\rightarrow 1$ .

$\mathbf{K3}$

$K\left(u\right)=\int\phi\left(v\right)e^{ivu}dv$ , where the characteristic function $\phi\left(v\right)$ satisfies that $v\phi\left(v\right)$ is integrable.

$\mathbf{K4}$

$a^{-3}n^{-1}+a\rightarrow 0$ as $n\rightarrow\infty$ .

It is clear that the Epanechnikov and the Gaussian kernel functions satisfy $\mathbf{K1}$ , $\mathbf{K2}$ and $\mathbf{K3}$ . One important observation is that $\mathbf{K1}$ rules out higher-order kernels by assuming $\kappa_{2}\neq 0$ . The consequence of dropping the assumption that $\kappa_{2}\not=0$ is discussed in detail in Remark 2 that follows the next proposition.

Proposition 3.

Suppose Assumptions Z, Q and K hold true. Then, under Assumption C

[TABLE]

while $\widehat{\xi}\overset{P}{\rightarrow}\xi$ under Assumption J.

Remark 2.

We now comment on the consequence of dropping the assumption that $\kappa_{2}\not=0$ . If we allowed for higher-order kernels, that is $\kappa_{2}=0$ and $\kappa_{3}=0$ but $\kappa_{4}\neq 0$ , $\widehat{\xi}$ would not be consistent. Indeed, Proposition 3 and Lemma 2 in the Appendix indicate that, without loss of generality for $\gamma_{0}=0$ and $\sigma^{2}=1$ , $\widehat{\xi}$ converges in probability to

[TABLE]

where $g_{r}\left(q\right)=E\left(x_{t2}^{r}\varepsilon_{t}^{2}\mid q_{t}=q\right)\$ and $g_{r}^{\ast}\left(q\right)=E\left(x_{t2}^{r}\mid q_{t}=q\right)$ . This is the case because dropping in $\mathbf{K1}$ the assumption of $\kappa_{2}\not=0$ and letting $\kappa_{2}=\kappa_{3}=0$ , the numerator in $\left(\ref{xhihat}\right)$ will be

[TABLE]

whereas the denominator in $\left(\ref{xhihat}\right)$ becomes

[TABLE]

So that, unless $E(\varepsilon_{t}^{2}\mid q_{t}=\gamma_{0})=E(\varepsilon_{t}^{2})$ , we obtain that (similar to the L’Hopital rule):

[TABLE]

and hence $\widehat{\xi}$ would not be a consistent estimator of the scale factor $\zeta$ .

We can construct the $100s$ percent confidence set of $\gamma_{0}$ by

[TABLE]

As we have already argued, this confidence set is valid under both scenarios, as the next theorem shows.

Theorem 2.

Let Assumption K, Z and Q hold true and suppose that either Assumption C or J hold. Then, for any $s\in\left(0,1\right)$ ,

[TABLE]

4 BOOTSTRAP

This section develops a bootstrap-based test inversion confidence interval for the unknown threshold parameter $\gamma_{0}$ , which is valid under Assumption C as well as under Assumption J. We do not discuss the bootstrap for $\alpha_{0}$ in detail but note that the bootstrap for the linear regression can be employed,222This excludes the case where $\gamma_{0}$ is not strongly identified in the sense that $\delta_{0}=d\cdot n^{-\varphi}$ with $\varphi\geq 1/2$ . This case has not been explored except when $d=0$ , see e.g. Hansen (1996) and it is an interesting future research area. see e.g. Shao and Tu $\left(1995\right)$ , since we can treat $\widehat{\gamma}$ as $\gamma_{0}$ for the inference on $\alpha_{0}$ due to the arguments leading to the asymptotic independence between $\widehat{\alpha}$ and $\widehat{\gamma}$ .

We propose using the bootstrap test inversion method, also known as the grid bootstrap, of Dümbgen $\left(1991\right)$ to build confidence intervals for the parameter $\gamma$ , see also Carpenter $\left(1999\right)$ and Hansen $\left(1999\right)$ . Such a test inversion bootstrap confidence interval (BCI) is known to have certain optimality properties as in e.g. Brown, Casella and Hwang $\left(1995\right)$ from the Bayesian perspective. Mikusheva $\left(2007\right)$ showed that test inversion BCI attains correct coverage probability uniformly over the parameter space for the sum of coefficients in autoregressive models, despite the behavior of the estimator not being uniform over the parameter space.

For a given confidence level $s$ , one can exploit the duality between hypothesis testing and confidence interval by inverting tests to obtain a confidence region

[TABLE]

where $F_{n}^{\ast}\left(s|\gamma\right)$ is the bootstrap estimate of the $s$ th quantile of the statistic $\widehat{\xi}\left(\gamma\right)^{-1}QLR_{n}\left(\gamma\right)$ when $\gamma_{0}=\gamma$ . In other words, it denotes the bootstrap critical value of level ( $1-s$ ) testing for $\mathcal{H}_{0}:\gamma_{0}=\gamma$ . In practice, one would estimate $F_{n}^{\ast}\left(s|\gamma\right)$ over a grid of $\gamma^{\prime}s$ and use some smoothing method such as linear interpolation or kernel averaging to obtain a smoothed bootstrap quantile function over a range of $\gamma$ . The region $\widehat{\Gamma}_{s}^{\ast}$ is known as $s$ -level grid bootstrap confidence interval (BCI) of $\gamma$ in the terminology of Hansen $\left(1999\right)$ .

Figure 1 illustrates how this confidence interval can be obtained in practice. The $QLR_{n}\left(\gamma\right)$ line is the linear interpolation of the rescaled $QLR_{n}\left(\gamma\right)$ statistic over the grid of $\gamma$ at 50 points. The ACV line is the asymptotic critical value of Hansen $\left(2000\right)$ . The true value of $\gamma_{0}$ was $2$ . We estimated bootstrap quantile function (described in the sequel) at 17 grid points and present the interpolated line as Grid quantile plot. The vertical arrow at intersections between $QLR_{n}\left(\gamma\right)$ and ACV yield the asymptotic confidence interval (ACI), while the vertical broken arrows indicate grid BCI based on the bootstrap.

Now, we describe the bootstrap procedure for the grid bootstrap. We repeat the following procedure for each values of $\gamma_{j}\in\left\{\gamma_{1},...,\gamma_{g}\right\}$ .

4.1 Bootstrap Algorithm for each $\gamma_{j}$

STEP 1

Obtain LSE $\left(\widehat{\alpha}^{\prime},\widehat{\gamma}\right)^{\prime}$ by minimizing $\left(\ref{s_theta}\right)$ and compute the LSE residuals

[TABLE]

STEP 2

Generate $\left\{\eta_{t}\right\}_{t=1}^{n}$ as $i.i.d.$ zero mean random variables with unit variance and finite fourth moments, and compute

[TABLE]

STEP 3

Obtain the least squares estimate using $\{y_{t}^{\ast}\}_{t=1}^{n}$ and $\{x_{t}\}_{t=1}^{n},$

[TABLE]

STEP 4

Compute the bootstrap analogues of $QLR_{n}$ and $\widehat{\xi}$ as

[TABLE]

and

[TABLE]

where $\widehat{\mathbb{S}}_{n}^{\ast}\left(\gamma\right)$ is defined analogously as $\widehat{\mathbb{S}}_{n}\left(\gamma\right)$ in (6) by replacing $y_{t}$ with $y_{t}^{\ast}$ .

STEP 5

Compute the bootstrap 100 $s$ -th quantile $F_{n}^{\ast}\left(s|\gamma_{j}\right)$ from the empirical distribution of $\widehat{\xi}^{\ast-1}QLR_{n}^{\ast}$ by repeating STEPs 2-4.

Next, we derive the convergences of the bootstrap LSE $\widehat{\alpha}^{\ast}$ and $\widehat{\gamma}^{\ast}$ for both continuous and discontinuous setups and show the consistency of the bootstrap statistic $\widehat{\xi}^{\ast}$ . These results then yield the validity of the bootstrap test inversion confidence set following the same arguments in the proof of Theorem 2.

As usual, the superscript “∗” indicates the bootstrap quantities and convergences of bootstrap statistics conditional on the original data. As in Shao and Tu (1995), the notation “ $\overset{d^{\ast}}{\longrightarrow},$ *in Probability” *signifies the the convergence in Probability of the random distribution functions of the bootstrap statistics in terms of the uniform metric and $A_{n}^{\ast}=o_{p^{\ast}}\left(1\right)$ means that $A_{n}^{\ast}\overset{d^{\ast}}{\longrightarrow}0,$ in Probability.

Theorem 3.

*Suppose that Assumptions Z and Q hold true.

$\left(\mathbf{a}\right)$ Under Assumption C, $\widehat{\alpha}^{\ast}$ and $\widehat{\gamma}^{\ast}$ are asymptotically independent and (in probability)*

[TABLE]

$\left(\mathbf{b}\right)$ * Under Assumption J, $\widehat{\alpha}^{\ast}$ and $\widehat{\gamma}^{\ast}$ are asymptotically independent and (in probability)*

[TABLE]

Our results can be compared with those already obtained in the literature regarding the validity of bootstrap for non-standard estimators. First, our consistency result seems to contradict Seijo and Sen’s $\left(2011\right)$ result on the inconsistency of a residual-based bootstrap and the nonparametric bootstrap (with $i.i.d.$ data) for the case where $\varphi=0$ , see also Yu $\left(2014\right)$ . The reason behind such contradictory conclusions lies in the observation that our setup differs from theirs in an important and vital way: they consider the case of a fixed size of the break whereas we consider the situation that $\delta_{0}=d\cdot n^{-\varphi}$ decreases with the sample size. Thus, their limiting distribution depends on the whole conditional distribution of $\varepsilon_{t}\eta_{t}d^{\prime}x_{t}$ given $q_{t}=\gamma_{0}$ in a complicated manner, whereas ours contains only an unknown scaling factor.

It is worth mentioning that the centering term for $\widehat{\gamma}^{\ast}$ is $\gamma_{0}$ , which reflects the fact that our resampling scheme imposes the hypothesized true value for the unknown threshold. This is important for the validity of our bootstrap since we do not impose the continuity restriction in our bootstrap resampling. By imposing the null value, our resampling scheme builds on $\sqrt{n}$ -consistent estimates.

Next, the consistency of $\widehat{\xi}^{\ast}$ is established in the following proposition.

Proposition 4.

Suppose Assumptions Z, Q and K hold and either of Assumption J or Assumption C holds true. Then,

[TABLE]

A direct consequence of Theorem 3 and Proposition 4 is the following theorem.

Theorem 4.

Now, suppose either Assumption J or Assumption C hold true in addition to Assumptions Z, Q and K. Then, (in probability)

[TABLE]

5 Monte Carlo Experiment

We generate data based on the following 3 specifications, with settings A and B being jump models akin to that considered in Hansen (2000, Section 4.2) and setting C representing the kink case.

[TABLE]

The main difference in our data generating process from that of Hansen (2000) is the conditional heteroscedasticity in $\varepsilon_{t}$ : we set $\varepsilon_{t}=|q_{t}|e_{t}$ where $\left\{e_{t}\right\}_{t\geq 1}$ and $\left\{q_{t}\right\}_{t\geq 1}$ were generated as mutually independent and $i.i.d.$ normal random variables with unit variance. This leads to conditional heteroscedasticity of the form $E(\varepsilon_{t}^{2}|q_{t})=q_{t}^{2}$ , in contrast to Hansen (2000) where $\varepsilon_{t}$ was generated from $N(0,1)$ . In setting A, we generated $x_{t}$ as $i.i.d.$ draws from $N(2,1)$ , independent of $\left\{e_{t}\right\}_{t\geq 1}$ and $\left\{q_{t}\right\}_{t\geq 1}$ , while we set $Eq_{t}=2.$ We generate $\left\{e_{t}\right\}_{t\geq 1}$ and $\left\{q_{t}\right\}_{t\geq 1}$ the same for setting B. For both settings A and B, we try $\gamma_{0}=2$ and $2.674$ , which correspond to the median and third quartile of $q_{t}$ , respectively. In setting C, we set $\gamma_{0}=0$ and try $Eq_{t}=0$ or $-0.674$ so that the threshold corresponds to the median or the third quartile of $q_{t}$ , respectively. For the grid $\Gamma_{n}$ used in estimation of $\gamma_{0}$ , we discarded $10\%$ of extreme values of realized $q_{t}$ and used $n/2$ number of equidistant points.

We investigate finite-sample performance of testing and confidence regions for $\gamma$ given in Sections 3 and 4. We first compare the Monte Carlo size of tests for the correct location of the threshold, based on the asymptotic theory of Hansen $\left(2000\right)$ , which covers diminishing jump models, and our bootstrap method. We then investigate coverage probabilities of confidence intervals, constructed from either the asymptotic theory of Hansen $\left(2000\right)$ , or test-inversion based on our bootstrap. Our method has the virtue of robustness across different settings, and the objective is to see how it works across the jump settings of A and B and the kink setting of C. In A and B, we try two sets of $\delta$ with $\varphi=1/4,1/8$ : $\delta=n^{-1/4}\sqrt{10}/4=0.25,0.1988,0.1672$ , and $\delta=n^{-1/8}\sqrt{10}/4=0.4446,0.3965,0.3636$ for $n=100,250,500$ reflecting Assumption J. In setting C, $\delta$ is fixed at $\delta=2$ in line with Assumption C.333Note that $\delta=0.25,2$ were the smallest and the largest values of $\delta$ tried in Hansen (2000), respectively. For the estimate $\widehat{\xi}$ of the scale factor for the $QLR_{n}$ statistic, Epanechnikov kernel and minimum-MSE bandwidth choice, given in Härdle and Linton $(1994)$ , were deployed.

Columns 4-6 of Tables 1-3 present Monte Carlo size of test of $H_{0}:\gamma=\gamma_{0}$ when $\gamma_{0}$ is the median of $q_{t}$ for nominal sizes $s=0.1,0.05,0.01$ for the three settings. We carried out 10,000 iterations, with one bootstrap per iteration, using the warp-speed method of Giacomini, Dimitris and White (2013). Using the asymptotic critical values delivers poor Monte Carlo sizes in settings A and B with substantial over-sizing, which is more severe in setting B. In contrast, the bootstrap test produces sizes that are close to the nominal ones, apart from $n=100$ in B, for both $\varphi$ . For the asymptotic test, the size results are somewhat better when $\varphi=1/8$ compared to $\varphi=1/4$ in settings A and B, although the over-sizing remains severe even for $\varphi=1/8$ in setting B as shown in Table 2. For the kink setting C, asymptotic test based on Hansen’s $(2000)$ results produces sizes that become very small with increasing $n$ , while the bootstrap test leads to good size results for $n=250,500$ .

Columns 8-10 of Tables 1-3 report the coverage probabilities of confidence intervals for $\gamma_{0}$ in the three settings, when $\gamma_{0}$ is the median of $q_{t}$ , and columns 11-13 present the case when $\gamma_{0}$ is the third quartile of $q_{t}$ , for confidence levels $\zeta=0.9,0.95,0.99$ . Results are based on 1,000 iterations and in each iteration, we generated bootstrap quantile plots by interpolating bootstrap quantiles obtained at 10 equidistant points of the realized support of $q_{t}$ from 399 bootstraps, and found intersections with the sample $QLR_{n}$ plot formed by interpolating between $n/2$ number of equidistant points after discarding $10\%$ of extreme values of realized $q_{t}$ .

In settings A and B reported in Tables 1 and 2, the coverage probability results are better when $\gamma_{0}$ is the third quartile of $q_{t}$ for both methods when $\varphi=1/4$ . For $\varphi=1/8$ , this is still the case, with the exception of bootstrap coverage probabilities in setting A, which are similar between the two values of $\gamma_{0}$ . In setting A as shown in Table 1, the asymptotic and bootstrap methods perform similarly, reporting lower-than-nominal coverage probabilities which improve with larger $n$ . In setting B, the bootstrap method delivers substantially better coverage probabilities than the asymptotic confidence intervals based on Hansen $(2000)$ , which remain substantially lower than the nominal level even for $n=500$ for $\varphi=1/4$ . Such under-coverage of asymptotic confidence intervals for small $\delta=0.25$ was also reported in Hansen’s (2000) Table 2, for homoskedastic error case. The coverage probability results are better when $\varphi=1/8$ compared to $\varphi=1/4$ for both methods in setting B, especially so for asymptotic confidence intervals. In Hansen’s (2000) Table 2, coverage probability was also good for $\delta=0.5$ .

In setting C reported in Table 3, the asymptotic coverage probabilities becomes close to 1 for all values of $\zeta$ for $n=250,500$ , while bootstrap coverage probabilities are satisfactory for $n=250,500$ . The bootstrap coverage probability is better when $\gamma_{0}$ is the third quartile of $q_{t}$ compared to when it is the median.444In Table 4 in Online Appendix, we report Monte Carlo size and coverage probability results for $\gamma$ when $\varphi=0$ with $\delta$ fixed at $\sqrt{10}/4=0.7906$ and $0.25$ in setting A ( $q_{t}\neq x_{t}$ ) with homoscedastic error. Fixed jump setup is not covered by Hansen (2000) or our bootstrap of Section 4, but nonetheless we investigate how the two methods perform in this setting for completeness.

6 EMPIRICAL APPLICATION: GROWTH AND DEBT

The so-called Reinhart-Rogoff hypothesis postulates that above some threshold (90 $\%$ being their estimate of this threshold), higher debt-to-GDP ratio is associated with lower GDP growth rate. There have been numerous studies that utilize the threshold regression models to assess this hypothesis, including Hansen $(2017)$ who fitted a kink model to a time series of US annual data, see Hansen $(2017)$ for references on earlier studies which fitted jump models to various data sets. As there is little guidance from economic theory on the choice between kink and jump models in this setting, we advocate the use of our robust inference on the threshold and slope parameters of the model.

Hansen $(2017)$ had fitted a kink model to US annual data on real GDP growth rate in year $t$ ( $y_{t}$ ) and debt-to-GDP ratio from the previous year ( $q_{t}$ ) for the period spanning 1792-2009 ( $n=218$ ), and estimated the threshold to be $43.8\%$ , while the slope parameters of $q_{t}$ were not significant. Before fitting the jump model to this data, we first tested for the presence of threshold effect using the testing procedure of Hansen (1996) with 1,000 bootstrap replications, and obtained $p$ -value of 0.047, rejecting the null hypothesis of no threshold effect. This is in contrast to the $p$ -value of 0.15 obtained by Hansen’s $(2017)$ test for presence of threshold effect when imposing the kink model. Hansen $(2017)$ had remained inconclusive on the presence of kink threshold effect, since the bootstrap method used there did not account for the time series nature of data and the high $p$ -value could have been due to modest power of the test.

The fitted jump model is given by:

[TABLE]

The sizes of the two regimes were 99 (below 17.2 $\%$ ) and 109 (above 17.2 $\%$ ). We obtained grid bootstrap confidence intervals for $\gamma_{0}$ to be (10.5, 39) for 95 $\%$ confidence level and (10.8, 38.6) for 90 $\%$ , based on 399 bootstrap iterations. Bootstrap quantiles were obtained at 38 grid points, which included $\widehat{\gamma}$ , $\widetilde{\gamma}$ and equidistant points on the realized support of $q_{t}$ after discarding 7.5 $\%$ of the largest and smallest values of $q_{t}$ in the sample.555There is currently no theoretical guide to the choice of the trimming parameter. Our choice of trimming out 7.5 $\%$ was guided by Sweden’s estimated $\tilde{\gamma}$ being the 12-th percentile of the $q_{t}$ in the data. Sensitivity check on changing choices of the trimming value is recommended. We find the points of intersection between the linearly interpolated bootstrap quantile line and the linear interpolation of sample $QLR_{n}(\gamma)$ test statistics for $H_{0}:\gamma_{0}=\gamma_{j}$ at grid points $\gamma_{j}$ consisting of 73 equidistant points and $\widehat{\gamma}$ , $\widetilde{\gamma}$ , as shown in Figure 2 for 90 $\%$ confidence level.

As the estimated threshold under the jump model is noticeably small at 17.2 $\%$ , our estimated jump model which suggests insignificance of effect of $q_{t}$ on $y_{t}$ above the threshold does not necessarily contradict the Reinhart-Rogoff hypothesis. To see if this could be an indication of presence of further threshold points, we applied Hansen (1996)’s testing procedure for presence of threshold effect on the lower and upper subsamples with 1000 bootstraps and obtained $p$ -values of 0.025 and 0.016, respectively. Hence, we conclude that the US time series data should be fitted to a threshold regression model with multiple threshold points.

To see if such conclusion holds across different countries, we proceeded by first applying Hansen (1996)’s test for the presence of threshold effect on Reinhart and Rogoff’s (2010) data for countries with relatively long time spans without missing observations. For Australia( $n=107$ ) and the UK( $n=178$ ), the $p$ -values with 1000 bootstraps were 0.795 and 0.98 so we conclude that there is no threshold effect for these countries in the relationship between the GDP growth and the debt-to-GDP ratio.

For data from Sweden for the period 1881-2009 ( $n=129$ ), the $p$ -value for Hansen (1996)’s test of presence of threshold effect with 1000 bootstraps for the whole sample is 0.048, while for the lower and upper regimes, divided by $\widehat{\gamma}$ , they were 0.979 and 0.131, respectively. The estimated jump model is:

[TABLE]

with the lower regime having 61 observations and upper regime containing 68. The coefficient of debt-to-GDP ratio is not statistically significant.

The grid bootstrap confidence intervals for $\gamma_{0}$ were (15.3, $\infty$ ) and (16.4, $\infty$ ) for 95 $\%$ and 90 $\%$ confidence levels. Shown in Figures 3 are linear interpolation of 90 $\%$ bootstrap quantiles at 27 grid points with 399 bootstraps and linear interpolation of QLR test statistic at each of 54 grid points.

We conclude that there is substantial heterogeneity across countries in the relationship between the GDP growth and the debt-to-GDP ratio, not only in the values of model parameters, but also in the kinds of models that are suitable.

7 CONCLUSION

This paper has developed unified inferential procedures for the threshold regression model. The unconstrained least squares estimator of the regression coefficient $\alpha$ turns out to enjoy the useful oracle property, which enables the standard asymptotic normal inference as in the linear regression model. On the other hand, we provide a judiciously constructed statistic, with which one can make inference of the unknown threshold without knowing the continuity of the threshold regression model. Asymptotically valid bootstrap inference is also proposed and shown to improve the finite sample performance of the asymptotic procedure.

An interesting future research area is extension to the nonparametric setting. For instance, see Card et al. (2008) and Pan (2015), who use the regression discontinuity methods 666Pan (2015, p.378) and a referee emphasize that this setting is not identical to the conventional regression discontinuity method (e.g. Angrist and Lavy (1999); Hahn et al. (2001)) due to the lack of knowledge on the precise location of the discontinuity. to test for the tipping phenomenon in racial segregation and gender segregation, respectively, or Landais (2014), who recommends testing for the location of the change-point as a validity check for the regression discontinuity design, even when the change-point is suggested by the institutional knowledge.

Appendix A PROOFS OF MAIN THEOREMS

Let us introduce some notation first. In what follows $C,C_{1},$ … denote generic positive finite constants, which may vary from line to line or expression to expression. Recall that $x_{t}=\left(1,x_{t2}^{\prime},q_{t}\right)^{\prime},$ $x_{t1}=\left(1,x_{t2}^{\prime}\right)^{\prime}$ , and $\mathbf{1}_{t}\left(b\right)=\mathbf{1}\left\{q_{t}>b\right\}$ , and introduce $\mathbf{1}_{t}\left(a;b\right)=\mathbf{1}\left\{a<q_{t}<b\right\}$ . Finally, we abbreviate $\psi-\psi_{0}$ by $\overline{\psi}$ for any parameter $\psi$ .

All the technical lemmas are given in the online supplement to this paper.

A.1 Proof of Proposition 1

Without loss of generality we assume that $\widehat{\gamma}\geq\gamma_{0}$ and $\gamma_{0}=0$ , so that $\delta_{10}=0\$ and $\delta_{20}=0$ under Assumption C. By definition, we have that

[TABLE]

By standard algebra and denoting $\upsilon=\beta+\delta$ ,

[TABLE]

which implies, because of the orthogonality of the terms on the right of the last displayed expression, that

[TABLE]

where

[TABLE]

**Consistency. ** It suffices to show that for any $\epsilon>0$ , $\eta>0$ , there is $n_{0}$ such that for all $n>n_{0}$ , $\Pr\left\{\left\|\widehat{\theta}-\theta_{0}\right\|>\eta\right\}<\epsilon$ , which is implied by

[TABLE]

where $\mathbb{D}_{n\ell}\left(\theta\right)=\mathbb{B}_{n\ell}\left(\theta\right)+\left(\mathbb{A}_{n\ell}\left(\theta\right)-E\left(\mathbb{A}_{n\ell}\left(\theta\right)\right)\right)$ for $\ell=1,2,3$ .

First $\left\|\overline{\theta}\right\|>\eta$ implies that either $\left(\mathbf{i}\right)$ $\left\|\overline{\gamma}\right\|>\eta/3$ and $\left\|\overline{\beta}\right\|\leq\eta/3$ , or $\left(\mathbf{ii}\right)$ $\left\|\overline{\beta}\right\|>\eta/3$ or $\left\|\overline{\upsilon}\right\|>\eta/3$ . When $\left(\mathbf{\ ii}\right)$ holds true, it is clear that

[TABLE]

whereas when $\left(\mathbf{i}\right)$ holds true, we have that

[TABLE]

because Assumption Q implies that $E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(\gamma\right)\right)$ , $E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(-\infty;0\right)\right)$ and $E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(0;\gamma\right)\right)$ are positive definite matrices uniformly in $\gamma>\eta$ and $\left|\left|\bar{\beta}+\delta_{0}\right|\right|>\eta/3$ if $\left\|\overline{\beta}\right\|\leq\eta/3$ because we can always choose $\eta$ such that $\left|\delta_{0}\right|\geq 2\eta/3$ . We have that

[TABLE]

where $\overline{\tau}=\left(\beta_{0}-\beta\right)+\delta_{0}$ . The motivation for the last displayed inequality comes from the fact that , say, implies that $E\left\{x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(\gamma_{1};\gamma_{2}\right)\right\}$ is a strictly positive and finite definite matrix which implies that for any vector $a^{\prime}=\left(a_{1}^{\prime},a_{2}\right)$ ,

[TABLE]

So, $\left(\ref{ineq_2}\right)$ and $\left(\ref{ineq_3}\right)$ imply that

[TABLE]

On the other hand, Lemma 1 and the uniform law of large numbers, respectively, imply that

[TABLE]

where $\mathbb{F}_{n}\left(\gamma_{1};\gamma_{2}\right)=\frac{1}{n}\sum_{t=1}^{n}\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(\gamma_{1};\gamma_{2}\right)-E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(\gamma_{1};\gamma_{2}\right)\right)\right)$ , and hence

[TABLE]

Thus $\widehat{\theta}-\theta_{0}=o_{p}\left(1\right)$ because the left side of $\left(\ref{consi_2}\right)$ is bounded by

[TABLE]

using $\left(\ref{ineq_1}\right)$ and $\left(\ref{ineq_4}\right)$ .

**Convergence Rate. **We shall show next that for any $\epsilon>0$ there exist $C>0$ , $\eta>0$ , $n_{0}$ such that for $n>n_{0}$ we have that

[TABLE]

Since $\Pr\left\{X_{n}+Y_{n}<0\right\}\leq\Pr\left\{X_{n}<0\right\}+\Pr\left\{Y_{n}<0\right\}$ for any sequence $X_{n}$ and $Y_{n}$ and $\inf_{x}\left\{f\left(x\right)+g\left(x\right)\right\}\geq\inf_{x}f\left(x\right)+\inf_{x}g\left(x\right)$ for any functions $f$ and $g$ , it suffices to show that for each $\ell=1,2,3$

[TABLE]

To that end, we shall first examine

[TABLE]

where

[TABLE]

Recall that we have assumed that $\gamma\geq 0$ , as the case $\gamma\leq 0$ follows similarly.

First by standard arguments,

[TABLE]

by Lemma 1 and the Markov’s inequality. Observe that the latter inequality is independent of $\Xi_{k}\left(\gamma\right)$ . Since $\sum_{j=1}^{\infty}2^{-j}<\infty$ , the probability in $\left(\ref{rate_1b}\right)$ can be made arbitrary small for large $C$ or small $\eta$ , thus satisfying the condition $\left(\ref{rate_1b}\right)$ . $\left(\ref{rate_1a}\right)$ follows similarly as is the case for $\ell=2$ and thus it is omitted.

We next examine $\left(\ref{rate_1a}\right)$ and $\left(\ref{rate_1b}\right)$ for $\ell=3$ . Observing $\left(\ref{pd}\right)$ and the arguments that follow, defining

[TABLE]

it suffices to show $\left(\ref{rate_1a}\right)$ and $\left(\ref{rate_1b}\right)$ for $\widetilde{\mathbb{A}}_{n3}\left(\theta\right)$ and $\widetilde{\mathbb{B}}_{n3}\left(\theta\right)$ . To that end, because $\overline{\tau}>C_{1}$ as $\left|\delta_{30}\right|>C_{1}>0$ , we obtain, since $Eq_{t}^{2}\mathbf{1}_{t}\left(0;\eta\right)\geq C_{1}\eta^{3}$

[TABLE]

by Lemma 1 and Markov’s inequality. Notice that this bound is independent of $\Xi_{j}\left(\upsilon\right)$ . But by summability of $2^{-3k/2}$ , we conclude that $\left(\ref{rate_1b}\right)$ holds true for $\ell=3$ by choosing $C$ large enough.

We now conclude the proof after we note that the left side of $\left(\ref{rate_1}\right)$ is bounded by

[TABLE]

using $\left(\ref{rate_2}\right)-\left(\ref{rate_4}\right)$ . $\blacksquare$

A.2 Proof of Theorem 1

Because the “ $\mathop{\mathrm{a}rgmin}$ ” is a continuous mapping, see Kim and Pollard $\left(1990\right)$ , and the convergence rates of $\widehat{\alpha}$ and $\widehat{\gamma}$ are obtained in Proposition 1, it suffices to examine the weak limit of

[TABLE]

over $\left\|h\right\|,\left|g\right|\leq C$ , where we assume $\gamma_{0}=0$ as before for notational convenience and reparametrize $h=\sqrt{n}\left(\alpha-\alpha_{0}\right)$ and $g=n^{1/3}\left(\gamma-\gamma_{0}\right).$ First, due to the uniform law of large numbers it follows that

[TABLE]

whereas Lemma 1 and the expansion of $E\left\{x_{t}\left(\frac{g}{n^{1/3}}\right)q_{t}\mathbf{1}_{t}\left(0;\frac{g}{n^{1/3}}\right)\right\}$ as in (30) imply that

[TABLE]

Therefore

[TABLE]

where

[TABLE]

The consequence of $\left(\ref{diff}\right)$ is then that the minimizer of $\mathbb{G}_{n}\left(h,g\right)$ is asymptotically equivalent to that of $\widetilde{\mathbb{G}}_{n}\left(h,g\right)$ . Thus, it suffices to show the weak convergence of $\widetilde{\mathbb{G}}_{n}^{1}\left(h\right)$ and $\widetilde{\mathbb{G}}_{n}^{2}\left(g\right)$ and that

[TABLE]

are $O_{p}\left(1\right)$ . The convergence of $\widetilde{\mathbb{G}}_{n}^{1}\left(h\right)$ and its minimization is straightforward since it is a quadratic function of $h.$

Next, the first term of $\widetilde{\mathbb{G}}_{n}^{2}\left(g\right)$ converges to $3^{-1}\delta_{30}^{2}f\left(0\right)\left|g\right|^{3}$ uniformly in probability because Lemma 1, i.e. (47), implies the uniform law of large numbers and the Taylor series expansion up to the third order yields

[TABLE]

where $\widetilde{g}\in\left(0,g\right)$ . When $g<0$ , it follows similarly as in this case the derivative should be multiplied by $-1$ , so that the limit becomes $3^{-1}f\left(0\right)\left|g\right|^{3}$ .

The second term in the definition of $\widetilde{\mathbb{G}}_{n}^{2}\left(g\right),$ that is $-2\sum_{t=1}^{n}q_{t}\varepsilon_{t}\mathbf{1}_{t}\left(0;\frac{g}{n^{1/3}}\right)$ converges weakly to $2\delta_{30}\sqrt{3^{-1}f\left(0\right)\sigma_{\varepsilon}^{2}\left(0\right)}W\left(g^{3}\right)$ . To see this note that Lemma 1, i.e. (46), yields the tightness of the process as explained in Remark 3. For the finite dimensional convergence, we can verify the conditions for martingale difference sequence CLT (e.g. Hall and Heyde’s (1980) Theorem 3.2). In particular, we need to show that for $u_{nt}=\sqrt{n}q_{t}\varepsilon_{t}\mathbf{1}_{t}\left(0;\frac{g}{n^{1/3}}\right)$ ,

[TABLE]

For $\left(i\right)$ , note that $En^{-2}\max_{t}\left|u_{nt}\right|^{4}\leq n^{-1}E\left|u_{nt}\right|^{4}=nEq_{t}^{4}\varepsilon_{t}^{4}\mathbf{1}_{t}\left(0;\frac{g}{n^{1/3}}\right)\rightarrow 0$ as $n\rightarrow\infty$ . For $\left(ii\right)$ , apply the same argument for the first term in $\widetilde{\mathbb{G}}_{n}^{2}\left(g\right)$ and an expansion similar to that in $\left(\ref{cubic exp}\right)$ . We now characterize the covariance kernel. To that end, we note that if $g_{1}$ and $g_{2}$ have different signs then the cross product becomes zero and for $g_{2}>g_{1}>0$ , similarly as with $\left(\ref{cubic exp}\right)$ , we have that

[TABLE]

The cases for $g_{1}>g_{2}>0\$ or $g_{2}<g_{1}<0$ are similar and thus omitted.

Finally, the covariance between $n^{-1/2}\sum_{t=1}^{n}\mathbf{x_{t}}\varepsilon_{t}$ and $\sum_{t=1}^{n}q_{t}\varepsilon_{t}\mathbf{1}_{t}\left(0;g/n^{1/3}\right)$ vanishes for the same reasoning, yielding the independence between $\widetilde{h}$ and $\widetilde{g}$ and thus the asymptotic independence between $\widehat{\alpha}$ and the threshold estimator $\widehat{\gamma}$ . $\blacksquare$

A.3 Proof of Proposition 2

Due to the asymptotic independence between $\widehat{\alpha}$ and $\widehat{\gamma}$ in Theorem 1, see (29) in its proof, we have that

[TABLE]

which corresponds to $\min_{g}\widetilde{\mathbb{G}}_{n}^{2}\left(g\right)$ in the proof of Theorem 1 due to the reparameterization $g=n^{1/3}\left(\gamma-\gamma_{0}\right)$ . It also shows that

[TABLE]

Finally, the desired result follows from applying the change of variables $g^{3}=3\phi\sigma_{\varepsilon}^{2}\left(\gamma_{0}\right)/\delta_{30}^{2}f\left(\gamma_{0}\right)$ because of the distributional equivalence $W\left(a^{2}g\right)=^{d}aW\left(g\right)$ (and $W\left(s\right)=^{d}-W\left(s\right)$ ) and the fact that $\min_{x}g\left(x\right)=-\max_{x}-g\left(x\right)$ for any function $g$ . $\blacksquare$

A.4 Proof of Theorem 2

It is known that the distribution function of $\max_{g\in\mathbb{R}}\left(2W\left(g\right)-\left|g\right|\right)$ is $F$ , as in Hansen (2000). Thus, under Assumption C, Propositions 2 and 3 yield the conclusion, while under Assumption J, Theorem 2 of Hansen (2000) verified the conclusion. $\blacksquare$

A.5 Proof of Theorem 3

Recalling our definition of $\widehat{\alpha}^{\ast}$ and $\widehat{\gamma}^{\ast}$ in $\left(\ref{theta_star}\right)$ , we begin by showing their consistency and rate of convergence, which is given in Proposition 5.

We now discuss the asymptotic distribution of the bootstrap estimators. We begin with part $\left(\mathbf{a}\right)$ . We assume $\gamma_{0}=0$ to simplify notation. Because the “ $\arg\max$ ” is continuous as mentioned in Theorem 2, it suffices to examine the weak limit of

[TABLE]

where $\left\|h\right\|,\left|g\right|\leq C$ .

First, recall that $\widetilde{\delta}_{1}=O_{p}\left(n^{-1/2}\right)$ and $\widetilde{\delta}_{2}=O_{p}\left(n^{-1/2}\right)$ under Assumption C and note that Lemma 1 and Lemma 4 imply that, uniformly in $\left\|h\right\|,\left|g\right|<C$ ,

[TABLE]

Thus, the latter implies that

[TABLE]

where

[TABLE]

The consequence of $\left(\ref{diffboot}\right)$ is then that the minimizer of $\mathbb{G}_{n}^{\ast}\left(h,g\right)$ is asymptotically equivalent to that of $\widetilde{\mathbb{G}}_{n}^{\ast}\left(h,g\right)$ . Thus, it suffices to show the weak convergence of $\widetilde{\mathbb{G}}_{1n}^{\ast}\left(h\right)$ and $\widetilde{\mathbb{G}}_{2n}^{\ast}\left(g\right)$ and that

[TABLE]

are $O_{p^{\ast}}\left(1\right)$ . The convergence of $\widetilde{\mathbb{\ \ G}}_{1n}^{\ast}\left(h\right)$ and its minimization follows by standard arguments as it is a quadratic function of $h$ so that it suffices to examine $\widetilde{\mathbb{G}}_{2n}^{\ast}\left(g\right)$ and it minimum.

Turning to the second term in the definition of $\widetilde{\mathbb{G}}_{2n}^{\ast}\left(g\right),$ we show that it converges to $2\delta_{30}\sqrt{3^{-1}f\left(0\right)\sigma_{\varepsilon}^{2}\left(0\right)}W\left(g^{3}\right)$ weakly (in probability). To this end, note that Lemma 4’s, and the Remark 4 that follows, yields the tightness of the process as explained in Remark 3. For the finite dimensional convergence, it follows by standard arguments as

[TABLE]

which converges in probability to $3^{-1}f\left(0\right)\sigma_{\varepsilon}^{2}\left(0\right)g^{3}$ and the Lindeberg’s condition follows easily.

Part $\left(\mathbf{b}\right)$ is also proved similarly and thus omitted for the sake of space. $\blacksquare$

A.6 Proof of Theorem 4

This is a direct consequence of Theorem 3 and Proposition 4 and the same arguments as the proof of Theorem 2. $\blacksquare$

Online Supplement to “Robust Inference in Threshold Regression Models”

by Javier Hidalgo, Jungyoon Lee, and Myung Hwan Seo

This supplement contains more numerical results for Section 5 and the remaining proofs of main theorems and supporting lemmas.

Appendix B-1 Table 4 for Monte Carlo study in Section 5

In Table 4, we report Monte Carlo size and coverage probability results for $\gamma$ when $\varphi=0$ with $\delta$ fixed at $\sqrt{10}/4=0.7906$ and $0.25$ in setting A ( $q_{t}\neq x_{t}$ ) with homoscedastic error. In Table 2 of Hansen (2000), Monte Carlo coverage probability of his asymptotic confidence interval is reported in a similar setup. He found that coverage rates increase with larger $\delta$ and larger $n$ , significantly above the nominal rate. Similar results are reported for Hansen’s asymptotic method in our Table 4: for $\delta=0.7906$ , under-sizing of test $H_{0}:\gamma=\gamma_{0}$ and over-coverage of confidence intervals for $\gamma$ are severe for all $n$ . For $\delta=0.25$ , the under-sizing and over-coverage become an issue for larger $n=250,500$ . On the other hand, our bootstrap method for the case $\delta=0.7906$ led to some over-sizing and severe under-coverage for all $n$ . For $\delta=0.25$ , results were more satisfactory, with the Monte Carlo size being close to the nominal size for all $n$ , and the coverage probability approaching the nominal level with larger $n$ .

Appendix B-2 Proofs of Propositions 3 and 4 and Proposition 5

B-2.1 Proof of Proposition 3

Recalling our notation in $\left(\ref{x_not}\right)$ and that $\delta_{1}+\delta_{3}\gamma_{0}=0$ and $\delta_{2}=0$ under Assumption C, we then have that

[TABLE]

Because we can rename $q_{t}-\gamma_{0}$ as $q_{t}$ , we shall assume without loss of generality that $\gamma_{0}=0$ so that $\delta_{1}=0$ .

Consider the case where $\widehat{\gamma}>0$ . The proof when $\widehat{\gamma}<0$ is analogous and thus it is omitted. By construction, we have that

[TABLE]

Because $\left(\delta_{1},\delta_{2}^{\prime}\right)=0$ and $\widehat{\beta}-\beta=O_{p}\left(n^{-1/2}\right)$ , $\widehat{\delta}-\delta=O_{p}\left(n^{-1/2}\right)$ and $\widehat{\gamma}=O_{p}\left(n^{-1/3}\right)$ , we obtain that

[TABLE]

Now $\left(\ref{deltax}\right)$ implies that $\left(\widehat{\delta}^{\prime}x_{t}\right)^{2}=\delta_{3}^{2}q_{t}^{2}+O_{p}\left(n^{-1/2}\right)\delta_{3}\left\|x_{t}\right\|q_{t}+O_{p}\left(n^{-1}\right)$ . So, by Lemma 2 and 3 and by the standard arguments using $na^{3}\rightarrow\infty$ , we conclude that the behaviour of numerator of $\left(\ref{xhihat}\right)$ is that of

[TABLE]

when $\kappa_{2}\neq 0$ , that is we do not assume higher-order kernels. Observe that $g_{0}\left(q\right)\$ in Lemma 2 corresponds to $\sigma^{2}\left(q\right)$ . More specifically, the contribution due to other terms in $\left(\ref{eps}\right)$ are indeed negligible by Lemma 3.

Similarly, the leading term in the denominator in $\left(\ref{xhihat}\right)$ is

[TABLE]

So, the convergence in $\left(\ref{xhihat}\right)$ follows from the last two displayed expressions. Finally, it is standard to show that $\mathbb{S}_{n}(\widehat{\theta})-\sigma^{2}=o_{p}\left(1\right)$ . This completes the proof of the proposition. $\blacksquare$

B-2.2 Proof of Proposition 4

As before we assume $\gamma_{0}=0$ . We show this proposition under Assumption C and the case with Assumption J is similar and thus omitted. Let $\widehat{\gamma}^{\ast}>0$ . The case when $\widehat{\gamma}^{\ast}<0$ is analogous and thus omitted. We shall examine the behaviour of the numerator of $\left(\ref{xhihatBoot}\right)$ , that of its denominator being similarly handled. By construction,

[TABLE]

Recall that when the constraint given in $\left(\ref{eq:conti}\right)$ holds true $\widetilde{\delta}_{2}$ and $\widetilde{\delta}_{1}$ are both $O_{p}\left(n^{-1/2}\right)$ . On the other hand Proposition 5 yields that $\widehat{\beta}^{\ast}-\widetilde{\beta}=O_{p^{\ast}}\left(n^{-1/2}\right)$ , $\widehat{\delta}^{\ast}-\widetilde{\delta}=O_{p^{\ast}}\left(n^{-1/2}\right)$ and $\widehat{\gamma}^{\ast}=O_{p^{\ast}}\left(n^{-1/3}\right)$ . Then, $\left(\widehat{\delta}^{\ast\prime}x_{t}\right)^{2}=\widetilde{\delta}^{\prime 2}x_{t}^{2}+O_{p^{\ast}}\left(n^{-1/2}\right)\widetilde{\delta}^{\prime}x_{t}q_{t}+O_{p^{\ast}}\left(n^{-1}\right)$ . And, proceeding as we did in the proof of Proposition 3, we easily deduce that

[TABLE]

By obvious arguments and those in $\left(\ref{eps_1Boot}\right)$ , it suffices to examine the behaviour of

[TABLE]

Now, because $\widetilde{\delta}_{2}$ and $\widetilde{\delta}_{1}$ are both $O_{p}\left(n^{-1/2}\right)$ when $\left(\ref{eq:conti}\right)$ holds true the behaviour of the last displayed expression is governed by

[TABLE]

which is $\kappa_{2}\delta_{30}^{2}a^{2}E^{\ast}\left[\varepsilon_{t}^{\ast 2}\mid q_{t}=\gamma_{0}\right]f\left(0\right)\left(1+o_{p^{\ast}}\left(1\right)\right)$ by Lemma 5 when $\kappa_{2}\neq 0$ , that is we do not assume higher-order kernels. Notice that, by standard results, the contribution due to other terms in $\left(\ref{epsBoot}\right)$ are indeed negligible by Lemma 6.

Likewise the denominator in $\left(\ref{xhihatBoot}\right)$ , is

[TABLE]

So, the convergence in $\left(\ref{xhihatBoot}\right)$ follows from the last two displayed expressions. Finally, it is standard that $\mathbb{S}_{n}(\widehat{\theta}^{\ast})-\sigma^{2}=o_{p^{\ast}}\left(1\right)$ . This completes the proof of the proposition. $\blacksquare$

B-2.3 Convergence Rate of Bootstrap Estimator

Proposition 5.

*Suppose that Assumptions Z and Q hold. Then,

$\left(\mathbf{a}\right)$ Under Assumption C,*

[TABLE]

$\left(\mathbf{b}\right)$ * Under Assumption J,*

[TABLE]

Proof of Proposition 5 Assuming without loss of generality that $\gamma\geq\widehat{\gamma}=\gamma_{0}$ and abbreviating $\widehat{\psi}-\psi$ by $\overline{\psi}$ for any parameter $\psi$ , proceeding as in Proposition 1, we obtain that

[TABLE]

where

[TABLE]

where, in what follows, for a generic sequence $\left\{z_{t}\right\}_{t\in\mathbb{Z}}$ we employ the notation $M_{n}^{z}\left(\gamma\right)=\frac{1}{n}\sum_{t=1}^{n}z_{t}z_{t}^{\prime}\mathbf{1}_{t}\left(\gamma\right)$ and $M_{n}^{z}\left(\gamma_{1};\gamma_{2}\right)=\frac{1}{n}\sum_{t=1}^{n}z_{t}z_{t}^{\prime}\mathbf{1}_{t}\left(\gamma_{1};\gamma_{2}\right)$ . It is also worth recalling that for $n$ large enough $0<\sup_{\gamma\in\Gamma}\left\|M_{n}^{x}\left(\gamma\right)\right\|=H_{n}$ and $0<\sup_{\gamma_{1}<\gamma_{2}}\left\|M_{n}^{x}\left(\gamma_{1};\gamma_{2}\right)\right\|=H_{n}$ , where in what follows $H_{n}$ denotes a sequence of strictly positive $O_{p}\left(1\right)$ random variables. Finally as we have in the proof of Proposition 1, because $E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(\gamma\right)\right)$ and $E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(0;\gamma\right)\right)$ are strictly finite positive definite matrices, $M_{n}^{x}\left(-\infty;\gamma\right)-E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(-\infty;\gamma\right)\right)=O_{p}\left(n^{-1/2}\right)$ and $M_{n}^{x}\left(\gamma\right)-E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(\gamma\right)\right)=O_{p}\left(n^{-1/2}\right)$ uniformly in $\gamma\in\Gamma$ , we have that

[TABLE]

where $\overline{\tau}=\left(\widehat{\beta}-\beta\right)+\widehat{\delta}$ . The motivation is that we employ in the proof of Proposition 1, after observing that Proposition 1 implies that $\widehat{\gamma}-\gamma_{0}=O_{p}\left(n^{-1/3}\right)$ and Lemma 1 that uniformly in $\gamma_{1}<\gamma_{2}\in\Gamma$ ,

[TABLE]

together with the fact that $M_{n}^{x}\left(-\infty;\widehat{\gamma}\right)=M_{n}^{x}\left(-\infty;\gamma_{0}\right)+M_{n}^{x}\left(\gamma_{0};\widehat{\gamma}\right)$ .

Consistency. We begin with part $\left(\mathbf{a}\right)$ . Arguing as in the proof of Proposition 1, it suffices to show that

[TABLE]

First, when $\left\|\overline{\theta}\right\|>\eta$ , it implies that either $\left(\mathbf{i}\right)$ $\left\|\overline{\gamma}\right\|>\eta/2$ or $\left(\mathbf{ii}\right)$ $\left\|\overline{\beta}\right\|,\left\|\overline{\upsilon}\right\|>\eta/2$ . When $\left(\mathbf{ii}\right)$ holds true, it is clear that

[TABLE]

whereas when $\left(\mathbf{i}\right)$ holds true, we obtain that

[TABLE]

because $E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(\gamma\right)\right)$ and $E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(0;\gamma\right)\right)$ are strictly positive definite matrices, since say $E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(0;\gamma\right)\right)-E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(0;\eta/4\right)\right)$ is a positive definite matrix when $\left\|\overline{\gamma}\right\|>\eta/2$ , $M_{n}^{x}\left(\widehat{\gamma};\gamma\right)=E\left(x_{t}x_{t}^{\prime}\mathbf{1}_{t}\left(0;\gamma\right)\right)\left(1+o_{p}\left(1\right)\right)\$ and $\widehat{\mathbb{A}}_{n\ell}\left(\theta\right)-E\left(\mathbb{A}_{n\ell}\left(\theta\right)\right)=o_{p}\left(1\right)$ . Recall that $E\left(a^{\prime}x_{t}\mathbf{1}_{t}\left(0;\eta\right)\right)>\eta\min_{q\in\left(0,\eta\right)}f\left(q\right)E\left(a^{\prime}x_{t}\right)$ . So, $\left(\ref{ineqboot_2}\right)$ and $\left(\ref{ineqboot_3}\right)$ implies that

[TABLE]

On the other hand, Lemma 4 implies that

[TABLE]

so that

[TABLE]

Thus $\left(\ref{ineqboot_7}\right)$ and $\left(\ref{ineqboot_4}\right)$ yields that $\widehat{\theta}^{\ast}-\widehat{\theta}=o_{p^{\ast}}\left(1\right)$ because the left side of $\left(\ref{consiboot_2}\right)$ is bounded by

[TABLE]

and then Markov’s inequality. This concludes the consistency proof.

Convergence rate. To that end, we shall show that for some $C>0$ large enough and $\epsilon>0$ ,

[TABLE]

To that end, we shall first examine

[TABLE]

where for some $j=1,...,\log_{2}\frac{\eta}{C}n^{1/2}$ and $k=1,...,\log_{2}\frac{\eta}{C}n^{1/3}$ , and $\Xi_{j}\left(\upsilon\right)$ and $\Xi_{k}\left(\gamma\right)$ are defined similarly to $\left(\ref{set_jk}\right)$ . Recall that we have assumed that $\gamma\geq 0$ since when $\gamma\leq 0$ the proof follows similarly.

Now Lemma 4 implies that

[TABLE]

Observe that the bound in $\left(\ref{rateboot_2}\right)$ is independent of $k$ , i.e. the set $\Xi_{k}\left(\gamma\right)$ . Defining

[TABLE]

$\left(\ref{definboot}\right)$ yields that

[TABLE]

by Lemma 4, which once again the bound is independent of $k$ .

Next, define

[TABLE]

then, because $\widehat{\tau}=H_{n}+C_{1}$ ,

[TABLE]

by Lemma 4 and Markov’s inequality. Observe that the latter displayed bound is independent of $j$ , i.e. the set $\Xi_{j}\left(\upsilon\right)$ .

So, the left side of $\left(\ref{rateboot_1}\right)$ is bounded by

[TABLE]

using $\left(\ref{rateboot_2}\right)-\left(\ref{rateboot_4}\right)$ . This concludes the proof of part $\left(\mathbf{a}\right)$ .

The proof of part $\left(\mathbf{b}\right)$ is similarly handled after obvious changes, so it is omitted. $\blacksquare$

Appendix B-3 AUXILIARY LEMMAS

We begin with a set of maximal inequalities, which play a central role in deriving convergence rates and tightness of various empirical processes. For $j=1$ or $2,$ let

[TABLE]

and for some sequence $\left\{z_{t}\right\}_{t=1}^{n}$ ,

[TABLE]

Lemma 1.

Suppose Assumptions Z and Q hold for the sequence $\left\{x_{t},\varepsilon_{t}\right\}_{t=1}^{n}$ . In addition, for $J_{3n}\left(\gamma\right),$ assume that $\left\{z_{t},q_{t}\right\}_{t=1}^{n}$ be a sequence of strictly stationary, ergodic, and $\rho$ -mixing with $\sum_{m=1}^{\infty}\rho_{m}^{1/2}<\infty$ , $E\left|z_{t}\right|^{4}<\infty$ and, for all $\gamma\in\Gamma$ , $E$$\left(\left|z_{t}\right|^{4}|q_{t}=\gamma\right)<C<\infty$ . Then, there exists $n_{0}<\infty$ such that for all $\gamma^{\prime}\$ in a neighbourhood of $\gamma_{0}\$ and for all $n>n_{0}$ and $\epsilon\geq n_{0}^{-1}$ ,

[TABLE]

where $j=1$ or $2$ .

Proof.

Part $\left(\mathbf{a}\right)$ proceeds as in Hansen’s $\left(2000\right)$ Lemma A.3, so it is omitted.

Next part $\left(\mathbf{b}\right)$ . This is almost identical to that of Hansen’s $\left(2000\right)$ Lemma A.3 once observing that if $|\gamma_{1}-\gamma^{\prime}|\leq\epsilon$ and $|\gamma_{2}-\gamma^{\prime}|\leq\epsilon$ and $h_{t}(\gamma_{1},\gamma_{2})=|\varepsilon_{t}(q_{t}-\gamma_{0})^{j}|\mathbf{1}_{t}(\gamma_{1},\gamma_{2})$ , then the bound in his Lemma A.1 (12) should be updated to

[TABLE]

where $C<\infty$ and $\epsilon_{1}=\left(\epsilon+\left|\gamma_{0}-\gamma^{\prime}\right|\right)$ , since $E\left(\left|\varepsilon_{t}^{r}\right||q_{t}\right)$ and the density $f\left(q\right)$ of $q_{t}$ are bounded around $q_{t}=\gamma_{0}$ . Hansen’s bound in (13) should be changed to $|\gamma_{1}-\gamma_{2}|\epsilon_{1}^{jr}$ for the same reason. Then, these new bounds imply that the bounds (15) and (16) in his Lemma A.3 and the bounds (18) and (20) in the proof of his Lemma A.2 should change to $\left|\gamma_{1}-\gamma_{2}\right|^{2}\epsilon_{1}^{4j}$ and $n^{-1}\left|\gamma_{1}-\gamma_{2}\right|\epsilon_{1}^{4j}+\left|\gamma_{1}-\gamma_{2}\right|^{2}\epsilon_{1}^{4j}$ , respectively, to yield the desired bound in $\left(\ref{eq:maxineq2a}\right)$ .

Part $\left(\mathbf{c}\right)$ . For notational simplicity we assume that $\gamma_{0}=0$ . Let $\gamma_{k}=k/n$ , for $k=1,...,m$ , where $m=\left[\epsilon n\right]+1$ . By triangle inequality,

[TABLE]

Now because $f\left(\cdot\right)$ is continuous differentiable at $\gamma_{0}$ , standard algebra yields that

[TABLE]

Next, using $\left(\ref{eq:max_1}\right)$

[TABLE]

Thus, using the inequality $\left(\sup_{j=1,...,\ell}\left|c_{j}\right|\right)^{4}\leq\sum_{j=1}^{\ell}\left|c_{j}\right|^{4}$ , we conclude that second term on the right of $\left(\ref{eq:max_2b_1}\right)$ has absolute moment bounded by

[TABLE]

However, from Lemma 3.6 of Peligrad $\left(1982\right)$ , for any $k>i$ ,

[TABLE]

So, using again $\left(\ref{eq:max_1}\right)$ and that $m=[\varepsilon n]+1$ and $n^{-1}<\varepsilon$ , we conclude that the first moment of the second term on the right of $\left(\ref{eq:max_2b_1}\right)$ is $C\epsilon^{j+1/2}$ .

Next the first moment of the first term on the right of $\left(\ref{eq:max_2b_1}\right)$ is also bounded by $C\epsilon^{j+1/2}$ by Billingsley’s $\left(1968\right)$ Theorem 12.2 using the last displayed inequality.

Finally part $\left(\mathbf{d}\right)$ **. **This is similar to that of $\left(\ref{eq:maxineq2b}\right)$ . It is sufficient to note that, with $J_{3n}\left(\gamma\right)$ , the bounds in $\left(\ref{eq:max_1}\right)$ and $\left(\ref{eq:max_2}\right)$ change to $C/n^{1/2}$ and $C\epsilon^{2}$ , respectively. This yields the results as $n^{-1}<\epsilon$ .

Remark 3.

One of the consequences of the previous lemma $\left(\mathbf{a}\right)$ and $\left(\mathbf{b}\right)$ , which allows the maximal inequality to hold for any $\gamma^{\prime}$ in a neighbourhood of $\gamma_{0}$ , is that

[TABLE]

which can be made small by choosing small $\epsilon$ and $r_{n}\rightarrow\infty$ . This is used to verify the stochastic equicontinuity of the rescaled and reparameterized empirical processes in the proof of Theorem 1.

The following two lemmas are used in the proof of Proposition 3. Before we state our next lemma, we need to introduce some notation. In what follows

[TABLE]

Note that we have implicitly assumed that $g_{r}\left(q\right)$ and $f\left(q\right)$ have four continuous derivatives. Also, without loss of generality, we assume $\gamma_{0}=0$ and $x_{t2}$ is a scalar to ease notation.

Lemma 2.

Under $\mathbf{K1,K2}$ and $\mathbf{K4}$ **, ** we have that for integers $0\leq\ell,r\leq 4$ ,

[TABLE]

Proof.

First, observe that we are using the normalization $\left(na^{1+\ell}\right)^{-1}$ instead of the standard $\left(na\right)^{-1}$ . This is due to the factor $q_{t}^{\ell}$ . We shall consider only the first equality in $\left(\ref{kernel_22}\right)$ , the second one being similarly handled. Now abbreviating $K_{t}\left(\gamma\right)=K\left(\frac{q_{t}-\gamma}{a}\right)$ , we have that standard kernel arguments imply

[TABLE]

So, to complete the proof of the lemma, it suffices to show that

[TABLE]

Proposition 1 implies that there exists $C$ such that $\Pr\left\{\left|\widehat{\gamma}\right|>Cn^{-1/3}\right\}\leq\eta$ , for any $\eta>0$ . So, we only need to show that $\left(\ref{kernel_3}\right)$ holds true when $\left|\widehat{\gamma}\right|\leq Cn^{-1/3}$ . In that case, we have that the left side of $\left(\ref{kernel_3}\right)$ is bounded by

[TABLE]

The expectation of second term on the right of $\left(\ref{kernel_31}\right)$ is bounded by

[TABLE]

because by $\mathbf{K1}$ , $\kappa_{\ell}<C_{1}$ , for $\ell\leq 4$ .

For some $0<\psi<1$ , the first term on the right of $\left(\ref{kernel_31}\right)$ is bounded by

[TABLE]

because $\mathbf{K4}$ implies that $\gamma=o\left(a\right)$ when $\left|\gamma\right|\leq Cn^{-1/3}$ , and hence if $a^{3/2}<\left|q_{t}\right|<a^{1/2}$ we have $\left|K^{\prime}\left(\frac{q_{t}-\phi\gamma}{a}\right)/K^{\prime}\left(\frac{q_{t}}{a}\right)\right|\leq C_{1}$ by $\mathbf{K2}$ . But, it is well known that the first moment of the first term on the right of $\left(\ref{ineq}\right)$ is bounded, whereas that of the second term on the right is also bounded because $E\left|\frac{q_{t}}{a}\right|^{\ell}\mathbf{1}\left(\left|q_{t}\right|<a^{3/2}\right)<a^{\left(\ell+3\right)/2}$ and

[TABLE]

So, the expectation of the first term on the right of $\left(\ref{kernel_31}\right)$ is $O\left(n^{-1/3}\right)$ . This concludes the proof of the lemma.

Lemma 3.

*Under $\mathbf{K1-K4}$ **, *we have that for integers $0\leq r,\ell\leq 4$ ,

[TABLE]

Proof.

To simplify the notation, we assume that $r=0$ . The left side of $\left(\ref{prop9_1}\right)$ is

[TABLE]

The second term is easily shown to be $O_{p}\left(n^{-1/2}a^{\ell-1/2}\right)$ . Next the first term of the last displayed expression is

[TABLE]

where $\zeta=1-2/\ell$ , if $\ell>2$ , and $\zeta<1$ if $\ell\leq 2$ . The second term of $\left(\ref{eps_1}\right)$ is

[TABLE]

whose first absolute moment is bounded by

[TABLE]

because by $\mathbf{K1}$ , $\kappa_{4}<\infty$ . So to complete the proof we need to examine the first term of $\left(\ref{eps_1}\right)$ , which using the characteristic function of the kernel function is

[TABLE]

But its clear that the last displayed expression is bounded by

[TABLE]

using that $\zeta=1-2/\ell$ , if $\ell\geq 2$ and $\zeta<1$ when $0\leq\ell<2$ , $\widehat{\gamma}=O_{p}\left(n^{-1/3}\right)$ and $\mathbf{K4}$ . This concludes the proof of the lemma.

We now extend the maximal inequalities in Lemma 1 to its bootstrap analogues. Define $J_{n}^{\ast}\left(\gamma,\gamma^{\prime}\right)$ and $J_{1n}^{\ast}\left(\gamma,\gamma^{\prime}\right)$ by replacing $\varepsilon_{t}$ in $J_{n}\$ and $J_{1n}$ with $\widehat{e}_{t}\eta_{t}$ , that is

[TABLE]

and recall that $H_{n}$ denotes a sequence of positive $O_{p}\left(1\right)$ random variables.

Lemma 4.

Under Assumption Z, we have that for all $\epsilon,\varsigma>0$ , there exists $\zeta>0$ such that

[TABLE]

Proof.

We shall assume for notational simplicity that $\gamma_{0}<\widehat{\gamma}$ , and that $\gamma_{j}=\gamma_{1}+\frac{\zeta}{m}j$ and $n\zeta/2<m<n\zeta$ , as $n$ can be chosen such that $n\zeta>1$ . By definition,

[TABLE]

Now by standard inequalities and that $\eta_{t}\sim iid\left(0,1\right)$ with a finite fourth moments, the fourth (bootstrap) moment of the right side of last displayed equation is bounded by

[TABLE]

Because for fixed $\zeta>0$ , there exists $n_{0}$ such that for $n>n_{0}$ , $Cn^{-1}<\zeta$ , the expectation of the first term of $\left(\ref{boot_12}\right)$ is bounded by

[TABLE]

arguing similarly as in Hansen’s $\left(2000\right)$ Lemma A.3 and $\zeta_{m}=\zeta/m$ .

Next, recalling that $\widehat{\gamma}=\gamma_{0}+D/n^{1/3}$ , because $\boldsymbol{1}\left(\gamma_{j}<q_{t}<\gamma_{k}\right)\boldsymbol{1}\left(\gamma_{0}<q_{t}<\widehat{\gamma}\right)\leq\boldsymbol{1}\left(\gamma_{j}<q_{t}<\gamma_{k}\right)$ , the expectation of the fourth term of $\left(\ref{boot_12}\right)$ is bounded by

[TABLE]

Finally, the second and third terms of $\left(\ref{boot_12}\right)$ are

[TABLE]

From here we now conclude that $\left(\ref{propboot_1}\right)$ holds true, so is the lemma proceeding as in Hansen’s $\left(2000\right)$ Lemma A.3 and in particular his expressions $\left(20\right)-\left(22\right)$ because if a sequence of random variables has finite first moments, it implies that it is $O_{p}\left(1\right)$ . The proof of $\left(\ref{propboot_2}\right)$ proceeds similarly and thus omitted.

Remark 4.

One of the consequences of the previous lemma is that

[TABLE]

which can be made small by choosing small $\epsilon$ and $r_{n}\rightarrow\infty$ .

Lemma 5.

Under $\mathbf{K1,K2}$ and $\mathbf{K4}$ * , *we have that for integers $0\leq\ell,r\leq 4$ ,

[TABLE]

Proof.

We shall consider only the first equality in $\left(\ref{kernel_22Boot}\right)$ , the second one being similarly handled. Now standard kernel arguments imply

[TABLE]

So, to complete the proof of the lemma, it suffices to show that

[TABLE]

Proposition 5 implies that there exists $C>0$ such that $\Pr^{\ast}\left\{\left|\widehat{\gamma}^{\ast}\right|>Cn^{-1/3}\right\}\leq H_{n}$ . So, we only need to show that $\left(\ref{kernel_3}\right)$ holds true when $\left|\widehat{\gamma}^{\ast}\right|\leq Cn^{-1/3}$ , so that we have that the left side of $\left(\ref{kernel_3Boot}\right)$ is bounded by

[TABLE]

The expectation of second term on the right of $\left(\ref{kernel_31Boot}\right)$ is bounded by

[TABLE]

where $C_{1}$ denotes a generic positive finite constant. Now,

[TABLE]

proceeding as we did in Lemma 2. So, we conclude that right of $\left(\ref{kernel_31Boot}\right)$ is $o\left(a^{2-\ell/4}\right)H_{n}$ .

For some $0<\psi<1$ , the first term on the right of $\left(\ref{kernel_31Boot}\right)$ is bounded by

[TABLE]

because $\mathbf{K4}$ implies that $\gamma=o\left(a\right)$ when $\left|\gamma\right|\leq Cn^{-1/3}$ , and hence $\left|K^{\prime}\left(\frac{q_{t}-\phi\gamma}{a}\right)/K^{\prime}\left(\frac{q_{t}}{a}\right)\right|\leq C_{1}$ by $\mathbf{K2}$ if $a^{3/2}<\left|q_{t}\right|<a^{1/2}$ . But, it is well known that the first moment of the first term on the right of $\left(\ref{ineqBoot}\right)$ is bounded, whereas that of the second term on the right is also bounded because $E\left|\frac{q_{t}}{a}\right|^{\ell}\mathbf{1}\left(\left|q_{t}\right|<a^{3/2}\right)<a^{\left(\ell+3\right)/2}$ and $\left(\ref{k_1}\right)$ . So, the expectation of the first term on the right of $\left(\ref{kernel_31Boot}\right)$ is $O_{p}\left(n^{-1/3}\right)$ . This concludes the proof of the lemma.

Lemma 6.

*Under $\mathbf{K1-K4}$ **, *we have that for integers $0\leq r,\ell\leq 4$ ,

[TABLE]

Proof.

To simplify the notation, we assume that $r=0$ . The left side of $\left(\ref{prop9_1Boot}\right)$ is

[TABLE]

The second term is easily shown to be $O_{p^{\ast}}\left(n^{-1/2}a^{\ell-1/2}\right)$ , whereas the first term is

[TABLE]

where $\zeta=1-2/\ell$ if $\ell>2$ and $\zeta<1$ if $\ell\leq 2$ . The second term of $\left(\ref{eps_1Boot}\right)$ is

[TABLE]

whose first absolute bootstrap moment is

[TABLE]

Now, proceed as in Lemma 5 to conclude that second term of $\left(\ref{eps_1Boot}\right)$ is $O_{p^{\ast}}\left(a^{\ell}\right)$ . So, to complete the proof we need to examine the first term of $\left(\ref{eps_1Boot}\right)$ which, as we did with the first term of $\left(\ref{eps_1}\right)$ , is

[TABLE]

But it is clear that the last displayed expression is bounded by

[TABLE]

using $\mathbf{K4}$ and that $\zeta=1-2/\ell$ if $\ell\geq 2$ and $\zeta<1$ when $0\leq\ell<2$ , $\widehat{\gamma}^{\ast}=O_{p^{\ast}}\left(n^{-1/3}\right)$ and that by standard arguments, it yields

[TABLE]

This concludes the proof of the lemma.

References

[1] Peligrad, M. (1982), “Invariance principles for mixing sequences of random variables”, The Annals of Probability, 10, 968-981.

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Abrevaya, J., and Huang, J. (2005). “On the bootstrap of the maximum score estimator”, Econometrica , 73, 1175-1204.
2[2] Angrist J. D. and Lavy, V. (1999).“Using Maimonides’ rule to estimate the effect of class size on scholastic achievement”, Quarterly Journal of Economics , 114, 533-575.
3[3] Bai, J., and Perron, P. (1998). “Estimating and testing linear models with multiple structural changes”, Econometrica , 66, 47-78.
4[4] Brown, L.D., Casella, G., and Hwang, J. T. G. (1995). “Optimal confidence sets, bioequivalence, and the Limaç on of Pascal”, Journal of the American Statistical Association , 90, 880-889.
5[5] Caner, M., Grennes, T., and Koehler-Geib, F. (2010). “Finding the tipping point-when sovereign debt turns bad”, Policy Research Working Paper Series 5391, The World Bank.
6[6] Card, D., Mas, A., and Rothstein, J. (2008), “Tipping and dynamics of segregation”, Quarterly Journal of Economics , 123, 177-218.
7[7] Carpenter, J. (1999). “Test inversion bootstrap confidence intervals”, Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 61, 159-172.
8[8] Chan, K. S. (1993). “Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model”, The Annals of Statistics , 21, 520-533.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Robust Inference for Threshold Regression Models††thanks: We thank anonymous referees and an Associate Editor for their constructive comments. M. Seo

Abstract

1 INTRODUCTION

2 MODEL AND ESTIMATORS

Assumption Z**.**

Assumption Q**.**

2.1 Estimators

3 Robust Confidence Regions

Assumption C**.**

Proposition 1**.**

Theorem 1**.**

Remark 1**.**

3.1 Inference on Regression Coefficient α\alphaα

3.2 Inference on Threshold γ\gammaγ

Assumption J**.**

Proposition 2**.**

Assumption K**.**

Proposition 3**.**

Remark 2**.**

Theorem 2**.**

4 BOOTSTRAP

4.1 Bootstrap Algorithm for each γj\gamma_{j}γj​

Theorem 3**.**

Proposition 4**.**

Theorem 4**.**

5 Monte Carlo Experiment

6 EMPIRICAL APPLICATION: GROWTH AND DEBT

7 CONCLUSION

Appendix A PROOFS OF MAIN THEOREMS

A.1 Proof of Proposition 1

A.2 Proof of Theorem 1

A.3 Proof of Proposition 2

A.4 Proof of Theorem 2

A.5 Proof of Theorem 3

A.6 Proof of Theorem 4

Appendix B-1 Table 4 for Monte Carlo study in Section 5

Appendix B-2 Proofs of Propositions 3 and 4 and Proposition 5

B-2.1 Proof of Proposition 3

B-2.2 Proof of Proposition 4

B-2.3 Convergence Rate of Bootstrap Estimator

Proposition 5**.**

Appendix B-3 AUXILIARY LEMMAS

Lemma 1**.**

Proof.

Remark 3**.**

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Remark 4**.**

Lemma 5**.**

Proof.

Lemma 6**.**

Proof.

References

Assumption Z.

Assumption Q.

Assumption C.

Proposition 1.

Theorem 1.

Remark 1.

3.1 Inference on Regression Coefficient $\alpha$

3.2 Inference on Threshold $\gamma$

Assumption J.

Proposition 2.

Assumption K.

Proposition 3.

Remark 2.

Theorem 2.

4.1 Bootstrap Algorithm for each $\gamma_{j}$

Theorem 3.

Proposition 4.

Theorem 4.

Proposition 5.

Lemma 1.

Remark 3.

Lemma 2.

Lemma 3.

Lemma 4.

Remark 4.

Lemma 5.

Lemma 6.