Nonparametric Regression with Multiple Thresholds: Estimation and   Inference

Yan-Yu Chiou; Mei-Yuan Chen; Jau-er Chen

arXiv:1705.09418·q-fin.EC·February 26, 2018

Nonparametric Regression with Multiple Thresholds: Estimation and Inference

Yan-Yu Chiou, Mei-Yuan Chen, Jau-er Chen

PDF

Open Access

TL;DR

This paper develops methods for estimating and testing the number and values of multiple thresholds in nonparametric regression models with an exogenous threshold variable, supported by simulations and an empirical application.

Contribution

It introduces a testing procedure to determine the unknown number of thresholds and derives their asymptotic properties, advancing nonparametric regression analysis.

Findings

01

The proposed test accurately identifies the number of thresholds.

02

Sequential estimation of threshold values is precise.

03

Monte Carlo simulations confirm the test's effectiveness.

Abstract

This paper examines nonparametric regression with an exogenous threshold variable, allowing for an unknown number of thresholds. Given the number of thresholds and corresponding threshold values, we first establish the asymptotic properties of the local constant estimator for a nonparametric regression with multiple thresholds. However, the number of thresholds and corresponding threshold values are typically unknown in practice. We then use our testing procedure to determine the unknown number of thresholds and derive the limiting distribution of the proposed test. The Monte Carlo simulation results indicate the adequacy of the modified test and accuracy of the sequential estimation of the threshold values. We apply our testing procedure to an empirical study of the 401(k) retirement savings plan with income thresholds.

Tables4

Table 1. Table 1: Critical values of F n ( s + 1 | s ) subscript 𝐹 𝑛 𝑠 conditional 1 𝑠 F_{n}(s+1|s)

$s + 1$	10%	5%	1%
1	1.281552	1.644854	2.326348
2	1.632219	1.954508	2.574961
3	1.818281	2.121201	2.711943
4	1.943196	2.234002	2.805821
5	2.036469	2.318679	2.876895

Table 2. Table 2: Empirical Sizes of F n ( s + 1 | s ) : h = c ⋅ σ ⋅ n − 1 / 4.25 : subscript 𝐹 𝑛 𝑠 conditional 1 𝑠 ℎ ⋅ 𝑐 𝜎 superscript 𝑛 1 4.25 F_{n}(s+1|s):\ h=c\cdot\sigma\cdot n^{-1/4.25}

$n$	500	1000	2000
$F_{n} (s + 1 \| s)$		$c = 1$
1%	0.021	0.017	0.011
5%	0.045	0.051	0.051
10%	0.076	0.084	0.086

Table 3. Table 3: Empirical Sizes of F n ( s + 1 | s ) : h = c ⋅ σ ⋅ n − 1 / 4.25 : subscript 𝐹 𝑛 𝑠 conditional 1 𝑠 ℎ ⋅ 𝑐 𝜎 superscript 𝑛 1 4.25 F_{n}(s+1|s):\ h=c\cdot\sigma\cdot n^{-1/4.25}

$F_{n} (s + 1 \| s)$	$c = 1.24$	$c = 1.30$	$c = 1.37$
$n$	2000	2000	2000
1%	0.018	0.011	0.018
5%	0.050	0.044	0.056
10%	0.087	0.090	0.086

Table 4. Table 4: Performance of the Threshold Estimations

	${\hat{γ}}_{3}$			${\hat{γ}}_{2}$			${\hat{γ}}_{1}$
$n$	${\bar{\hat{γ}}}_{3}$	$s e ({\hat{γ}}_{3})$	MSE( ${\hat{γ}}_{3}$ )	${\bar{\hat{γ}}}_{2}$	$s e ({\hat{γ}}_{2})$	MSE( ${\hat{γ}}_{2}$ )	${\bar{\hat{γ}}}_{1}$	$s e ({\hat{γ}}_{1})$	MSE( ${\hat{γ}}_{1}$ )
500	0.4227	0.2542	0.0705	0.1775	0.0960	0.0100	-0.6523	0.2407	0.0602
1000	0.4867	0.1245	0.0160	0.1529	0.0320	0.0010	-0.6894	0.1198	0.0140
3000	0.5025	0.0079	6.9 $\times 10^{- 6}$	0.1498	0.0077	5.9 $\times 10^{- 5}$	-0.7029	0.0040	2.4 $\times 10^{- 5}$

Equations585

E (Y ∣ X, Q) = j = 1 \sum s + 1 m_{γ_{j}} (X) I_{γ_{j}} (Q),

E (Y ∣ X, Q) = j = 1 \sum s + 1 m_{γ_{j}} (X) I_{γ_{j}} (Q),

I_{γ_{j}} (Q)

I_{γ_{j}} (Q)

m_{γ_{j}} (x)

m_{γ_{j}} (x)

Y_{i} = j = 1 \sum s + 1 m_{γ_{j}} (X_{i}) I_{γ_{j}} (Q_{i}) + e_{i}

Y_{i} = j = 1 \sum s + 1 m_{γ_{j}} (X_{i}) I_{γ_{j}} (Q_{i}) + e_{i}

K_{h} (u) \equiv h^{- p} K (u / h),

K_{h} (u) \equiv h^{- p} K (u / h),

\hat{f}_{γ_{j}} (y, x)

\hat{f}_{γ_{j}} (y, x)

\hat{f}_{γ_{j}} (x)

\overset{m}{^}_{γ_{j}} (x) = \frac{\sum _{i = 1}^{n} K _{h} ( X _{i} - x ) I _{γ_{j}} ( Q _{i} ) Y _{i}}{\sum _{i = 1}^{n} K _{h} ( X _{i} - x ) I _{γ_{j}} ( Q _{i} )} .

\overset{m}{^}_{γ_{j}} (x) = \frac{\sum _{i = 1}^{n} K _{h} ( X _{i} - x ) I _{γ_{j}} ( Q _{i} ) Y _{i}}{\sum _{i = 1}^{n} K _{h} ( X _{i} - x ) I _{γ_{j}} ( Q _{i} )} .

sup ∣ \hat{f}_{γ_{j}} (x) - f_{γ_{j}} (x) ∣ = O_{p} (h^{r} + (ln (n))^{1/2} / (n h^{p})^{1/2}), j = 1, \dots, s + 1.

sup ∣ \hat{f}_{γ_{j}} (x) - f_{γ_{j}} (x) ∣ = O_{p} (h^{r} + (ln (n))^{1/2} / (n h^{p})^{1/2}), j = 1, \dots, s + 1.

(n h^{p})^{1/2} {\hat{f}_{γ_{j}} (x) - f_{γ_{j}} (x) - \frac{1}{2} h^{2} C_{1} l = 1 \sum p f_{γ_{j}, l}^{(2)} (x)} \to N (0, C_{2} f_{γ_{j}} (x))

(n h^{p})^{1/2} {\hat{f}_{γ_{j}} (x) - f_{γ_{j}} (x) - \frac{1}{2} h^{2} C_{1} l = 1 \sum p f_{γ_{j}, l}^{(2)} (x)} \to N (0, C_{2} f_{γ_{j}} (x))

C_{1} = \int u^{2} K (u) d u, C_{2} = [\int K^{2} (u) d u]^{p}, f_{γ_{j}, l}^{(2)} (x) = \frac{\partial ^{2} f _{γ_{j}} ( x )}{\partial x _{l}^{2}} . □

C_{1} = \int u^{2} K (u) d u, C_{2} = [\int K^{2} (u) d u]^{p}, f_{γ_{j}, l}^{(2)} (x) = \frac{\partial ^{2} f _{γ_{j}} ( x )}{\partial x _{l}^{2}} . □

sup ∣ \overset{m}{^}_{γ_{j}} (x) - m_{γ_{j}} (x) ∣ = O_{p} (h^{r} + (ln (n))^{1/2} / (n h^{p})^{1/2}), j = 1, \dots, s + 1

sup ∣ \overset{m}{^}_{γ_{j}} (x) - m_{γ_{j}} (x) ∣ = O_{p} (h^{r} + (ln (n))^{1/2} / (n h^{p})^{1/2}), j = 1, \dots, s + 1

(n h^{p})^{1/2} [\overset{m}{^}_{γ_{j}} (x) - m_{γ_{j}} (x) - A B (x)] \to N (0, C_{2} \frac{σ _{γ_{j}}^{2} ( x )}{f _{γ_{j}} ( x )})

(n h^{p})^{1/2} [\overset{m}{^}_{γ_{j}} (x) - m_{γ_{j}} (x) - A B (x)] \to N (0, C_{2} \frac{σ _{γ_{j}}^{2} ( x )}{f _{γ_{j}} ( x )})

A B (x) = \frac{1}{2} h^{2} C_{1} l = 1 \sum p [m_{γ_{j}, l}^{(2)} (x) f_{γ_{j}} (x) + 2 m_{γ_{j}, l}^{(1)} (x) f_{γ_{j}, l}^{(1)} (x)] / f_{γ_{j}} (x),

A B (x) = \frac{1}{2} h^{2} C_{1} l = 1 \sum p [m_{γ_{j}, l}^{(2)} (x) f_{γ_{j}} (x) + 2 m_{γ_{j}, l}^{(1)} (x) f_{γ_{j}, l}^{(1)} (x)] / f_{γ_{j}} (x),

\mbox M I S E (h) = \int\int \mbox E [j = 1 \sum s + 1 (\overset{m}{^}_{γ_{j}} (x) - m_{γ_{j}} (x)) I_{γ_{j}} (q)]^{2} w (x) d x d q

\mbox M I S E (h) = \int\int \mbox E [j = 1 \sum s + 1 (\overset{m}{^}_{γ_{j}} (x) - m_{γ_{j}} (x)) I_{γ_{j}} (q)]^{2} w (x) d x d q

h_{o pt} = ar g h min \mbox M I S E (h) .

h_{o pt} = ar g h min \mbox M I S E (h) .

H_{0} : P r [m (W, V) - m (W)] = 1

H_{0} : P r [m (W, V) - m (W)] = 1

\displaystyle I_{\gamma_{j-1},\tau_{j}}(Q_{i})=\left\{\begin{array}[]{cl}1,&Q_{i}\in[\gamma_{j-1},\tau_{j}),\\ 0,&else,\end{array}\right.\,,\hskip 8.53581ptI_{\tau_{j},\gamma_{j}}(Q_{i})=\left\{\begin{array}[]{cl}1,&Q_{i}\in[\tau_{j},\gamma_{j}),\\ 0,&else,\end{array}\right.,

\displaystyle I_{\gamma_{j-1},\tau_{j}}(Q_{i})=\left\{\begin{array}[]{cl}1,&Q_{i}\in[\gamma_{j-1},\tau_{j}),\\ 0,&else,\end{array}\right.\,,\hskip 8.53581ptI_{\tau_{j},\gamma_{j}}(Q_{i})=\left\{\begin{array}[]{cl}1,&Q_{i}\in[\tau_{j},\gamma_{j}),\\ 0,&else,\end{array}\right.,

f_{γ_{j - 1}, τ_{j}} (x, y)

f_{γ_{j - 1}, τ_{j}} (x, y)

f_{γ_{j - 1}, τ_{j}} (x)

m_{γ_{j - 1}, τ_{j}} (x)

H_{0} : P r [E (Y ∣ X, Q; γ_{1}, \dots, γ_{s}) = E (Y ∣ X, Q; γ_{1}, \dots, γ_{j - 1}, τ_{j}, γ_{j}, \dots, γ_{s})] = 1.

H_{0} : P r [E (Y ∣ X, Q; γ_{1}, \dots, γ_{s}) = E (Y ∣ X, Q; γ_{1}, \dots, γ_{j - 1}, τ_{j}, γ_{j}, \dots, γ_{s})] = 1.

\tilde{Γ} (τ_{j})

\tilde{Γ} (τ_{j})

a (X)

a (X)

σ^{- 1} (τ_{j}) {n h^{p /2} \tilde{Γ} (τ_{j}) - h^{- p /2} ξ (τ_{j})} ⟶ d N (0, 1),

σ^{- 1} (τ_{j}) {n h^{p /2} \tilde{Γ} (τ_{j}) - h^{- p /2} ξ (τ_{j})} ⟶ d N (0, 1),

ξ (τ_{j}) = C_{2} [ξ_{1} (τ_{j}) + ξ_{2} (τ_{j})]

ξ (τ_{j}) = C_{2} [ξ_{1} (τ_{j}) + ξ_{2} (τ_{j})]

ξ_{1} (τ_{j})

ξ_{1} (τ_{j})

ξ_{2} (τ_{j})

σ^{2} (τ_{j}) = 2 C_{3} [σ_{1}^{2} (τ_{j}) + σ_{2}^{2} (τ_{j})]

σ^{2} (τ_{j}) = 2 C_{3} [σ_{1}^{2} (τ_{j}) + σ_{2}^{2} (τ_{j})]

σ_{1}^{2} (τ_{j})

σ_{1}^{2} (τ_{j})

σ_{2}^{2} (τ_{j})

σ_{γ_{j}}^{2} (x)

σ_{γ_{j}}^{2} (x)

σ_{γ_{j - 1}, τ_{j}}^{2} (x)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMonetary Policy and Economic Impact · Housing Market and Economics · Spatial and Panel Data Analysis

Full text

Nonparametric Regression with Multiple Thresholds: Estimation and Inference

Yan-Yu Chioua, Mei-Yuan Chenb,∗, Jau-er Chenc,∗

aInstitute of Economics, Academia Sinica, Taiwan.

bDepartment of Finance, National Chung Hsing University, Taiwan.

cDepartment of Economics, National Taiwan University, Taiwan.

2nd-round R&R at the Journal of Econometrics

We are grateful to the two anonymous referees for their constructive comments that have greatly improved this paper. We thank Ming-Yen Cheng for valuable discussions, and thank Zongwu Cai and the participants at the International Symposium on Recent Developments in Econometric Theory with Applications in Honor of Professor Takeshi Amemiya for their helpful comments. The usual disclaimer applies. *Corresponding authors: National Chung Hsing University, Department of Finance, 250 Kuo Kuang Road, Taichung 402, Taiwan. Tel.: +886-4-22853323. E-mail address: $\mbox{mei}_{-}\mbox{yuan}$ @dragon.nchu.edu.tw (Mei-Yuan Chen); National Taiwan University, Department of Economics, No. 1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan. Tel.: +886-2-3366-8326. E-mail address: [email protected] (Jau-er Chen).

ABSTRACT

This paper examines nonparametric regression with an exogenous threshold variable, allowing for an unknown number of thresholds. Given the number of thresholds and corresponding threshold values, we first establish the asymptotic properties of the local constant estimator for a nonparametric regression with multiple thresholds. However, the number of thresholds and corresponding threshold values are typically unknown in practice. We then use our testing procedure to determine the unknown number of thresholds and derive the limiting distribution of the proposed test. The Monte Carlo simulation results indicate the adequacy of the modified test and accuracy of the sequential estimation of the threshold values. We apply our testing procedure to an empirical study of the 401(k) retirement savings plan with income thresholds.

Keywords: nonparametric regression, threshold variable, threshold value, significance test

JEL Classification: C12; C13; C14

1 Introduction

Piecewise linearity has been widely used to model shifts in economic relationships under a regression framework. Most regressions with piecewise linearity can be represented as linear regressions with thresholds. For example, linear regressions with structural changes can be written as linear threshold regressions with the time index as the threshold variable. Among previous studies, Bai and Perron (1998, 2003), Qu and Perron (2007), and Yamamoto and Perron (2013) estimate and test linear regressions with structural changes and Chen (2008), Qu (2008), and Oka and Qu (2011) estimate and test linear quantile regressions with structural changes. The threshold model splits the sample into classes based on the value of an observed variable (i.e., whether it exceeds a certain threshold). In empirical work, determining the threshold of economic variables such as taxes rates as well as the optimal public debt ratio is relevant for policy makers. When the threshold is unknown as is typical in practice, it needs to be estimated, and this consequently increases the complexity of the econometric problem. Nonetheless, theories of estimation and inference are well developed for linear models with exogenous regressors, including the works by Chan (1993), Hansen (1996, 1999, 2000), and Caner (2002).

The scope of threshold models has broadened considerably in recent years. In particular, discussions of piecewise linearity have been extended to nonparametric regressions. Su and Xiao (2008), for instance, test for structural changes in time-series nonparametric regression models, while Chen and Hong (2012) investigate how to test for smooth structural changes in time-series models by using nonparametric regressions. In addition, Chen and Hong (2013) extend their earlier study to test for smooth structural changes in panel data models. In economics, the regression discontinuity (RD) design has gradually emerged as a common tool in applied research. The validity of RD estimates depends crucially both on the threshold variable (also termed the running variable in the RD literature) and on an adequate description of the conditional mean function of the outcome variable. Since what looks like a jump at the threshold might simply be unaccounted for nonlinearity, the nonparametric approach plays an important role in the RD estimations (cf. Angrist and Pischke, 2009). For example, by allowing for an unknown threshold value in the RD framework, Henderson, Parmeter, and Su (2014) provide estimation and inference procedures for the threshold value in a nonparametric regression with one threshold. Although related to Henderson et al. (2014), which is a pioneering study examining the nonparametric regression with one threshold, our study analyzes nonparametric regression with multiple thresholds. Further, in contrast to Henderson et al. (2014), the threshold variable is excluded from the explanatory variables in our framework. In empirical applications, multiple thresholds might be present; however, the number of thresholds and the corresponding threshold values are typically unknown in practice. Therefore, identifying the unknown number of thresholds and estimating the threshold values are critical issues in a nonparametric regression with multiple thresholds, especially when conducting empirical studies. We thus propose a testing procedure to determine the unknown number of thresholds and derive the limiting distribution of the proposed test. To the best of our knowledge, the present study is the first to comprehensively investigate the aforementioned issues. This study develops a test procedure for testing the existence of thresholds, determining the number of thresholds, and estimating the values of thresholds in nonparametric regression. Specifically, this procedure is a modified significance test based on the work of Aït-Sahalia et al. (2001). In addition, we establish the consistency and asymptotic normality of the threshold value estimators by using the sequential method. Hence, this study complements the existing literature on estimating and testing multiple thresholds in nonparametric regression models. Further, we apply our testing procedure to an empirical study of the 401(k) retirement savings plan with income thresholds and identify four threshold values. Those crucial income threshold values are all above the median income value.

The rest of the paper is organized as follows. The model specification and estimation for a nonparametric regression with thresholds are introduced in Section 2. This section also summarizes the necessary assumptions for deriving our theoretical results of the test statistics and estimators under the known thresholds. Section 3 provides the test determining the unknown number of thresholds. Section 4 presents the statistical properties of the multiple threshold estimator. Section 5 investigates the performance of these tests by using Monte Carlo studies, while Section 6 presents an empirical application. Section 7 concludes. All the technical proofs are collected in the Appendix.

2 Model, Assumptions, and Asymptotics

We first fix the notations and consider the following threshold model, which is a nonparametric regression with $s$ thresholds and known threshold values:

[TABLE]

where $Y$ is the outcome variable, $\mathbf{X}$ is a vector of the covariates, $Q$ is the threshold variable, which is used to split the sample into distinct $s$ thresholds, $\gamma_{1},\gamma_{2},\ldots,\gamma_{s+1}$ are the corresponding threshold values, and $I_{\gamma_{j}}(Q_{i})$ denotes an indicator function defined as

[TABLE]

with $\gamma_{0}=-\infty$ and $\gamma_{s+1}=\infty$ . Accordingly the conditional mean of the $j$ th regime at a grid point $\mathbf{x}=[x_{1},\ldots,x_{p}]^{\prime}$ can be represented as

[TABLE]

where $f_{\gamma_{j}}(y,\mathbf{x})=\int I_{\gamma_{j}}(q)f(y,\mathbf{x},q)dq$ and $f_{\gamma_{j}}(\mathbf{x})=\int I_{\gamma_{j}}(q)f(\mathbf{x},q)dq$ denote the joint density function of $Y$ and $\mathbf{X}$ and the marginal density of $\mathbf{X}$ in the $j$ th regime, respectively.

Given a sample with observations $\{(Y_{i},\mathbf{X}_{i}^{\prime},Q_{i})^{\prime},i=1,\ldots,n\}$ , the nonparametric regression with known $s$ thresholds is specified as

[TABLE]

where $Y_{i}$ , $\mathbf{X}_{i}$ , and $Q_{i}$ are the $i$ th sample observations of $Y$ , $\mathbf{X}$ , and $Q$ , respectively; $e_{i}$ is the regression error. Note that the threshold values satisfy $\gamma_{0}<\gamma_{1}<\ldots<\gamma_{s+1}$ .

Given a $p$ -dimensional product kernel function, $\mathcal{K}(\mathbf{u})$ , in which $\mathcal{K}_{h}(\mathbf{u})$ is defined as

[TABLE]

the sample kernel density estimators of $f_{\gamma_{j}}(y,\mathbf{x})$ and $f_{\gamma_{j}}(\mathbf{x})$ are

[TABLE]

Thus, the standard Nadaraya-Watson kernel regression estimator of $m_{\gamma_{j}}(\mathbf{x})$ is

[TABLE]

2.1 Assumptions

To establish the asymptotic properties of the conditional mean estimator, $\hat{m}_{\gamma_{j}}(\mathbf{x})$ , and the density estimator, $\hat{f}_{\gamma_{j}}(y,\mathbf{x})$ , in the $j$ th regime, as well as the convergence rate of the optimal bandwidth selector, we make the following assumptions.

Assumption 1. The following assumptions are specified for the random variables under study.

1-1.

$\mathbf{Z}_{i}=(Y_{i},\mathbf{X}_{i},Q_{i})$ is strictly stationary, ergodic and $\beta$ -mixing with $\beta$ coefficients for some fixed $\varepsilon>0$ , satisfying $\sum^{\infty}_{k=1}k^{2}[\beta(k)]^{\frac{\varepsilon}{1+\varepsilon}}<\infty$ . 2. 1-2.

The density $f(y,\mathbf{x},q)$ is bounded away from zero and globally integrable on the compact support $S$ of the weighting function $a(\cdot)$ , where $a(\cdot)$ is defined in Section 3.1 when we construct the proposed test statistic. Hence $\inf_{S}$ $f(\mathbf{x},q)\equiv b\geq 0$ . 3. 1-3.

The joint density $f_{1,1+j}$ of $(\mathbf{Z}_{1},\mathbf{Z}_{1+j})$ exists for all $j$ and is continuous on $(R\times S)^{2}$ . 4. 1-4.

$\mbox{E}[e^{4}_{i}|\mathbf{X}_{i}=\mathbf{x},Q_{i}=q]$ $\leq$ $\infty$ , $\mbox{E}(e^{2}_{i}|\mathbf{X}_{i}=\mathbf{x},Q_{i}=q)=\sigma^{2}(\mathbf{x},q)$ and $\sigma^{2}(\mathbf{x},q)$ is square-integrable on $S$ . 5. 1.5.

$\int|m_{\gamma_{l}}(\mathbf{x}_{i})-m_{\gamma_{k}}(\mathbf{x}_{i})|d\mathbf{x}_{i}\neq 0$ , $l,k=1,\ldots,s+1$ and $l\neq k$ .

Assumption 2. The following assumptions are imposed on the kernel function.

2-1.

$\mathcal{K}$ is a product kernel, $\mathcal{K}=K_{1}\times\cdots\times K_{p}=K^{p}$ , given $K_{i}=K,\forall i$ , and a bounded function on $R^{p}$ , symmetric about 0, with $\int|K(z)|dz<\infty$ , $\int K(u)du=1$ , $\int u^{j}K(u)du=0,j=1,\ldots,r-1$ , and $\int u^{r}K(u)du<\infty$ . 2. 2.2.

The kernel $K$ is $r$ th continuous differentiable with $r>3p/4$ .

Assumption 3. The following assumptions are assumed for the bandwidth selector.

3-1.

As $n\to\infty$ , $h\to 0$ , $nh^{p}\to\infty$ and $nh^{p+2r+2}\to 0$ . 2. 3-2.

As $n\to\infty$ , the bandwidth sequence $h=O(n^{-1/\delta})$ is such that $2p<\delta<2r+p/2$ and then $h\to 0$ , $nh^{p}\to=\infty$ and $nh^{p/2+2r}\to 0$ .

Assumptions 1-1 and 1-3 are similar to Assumption 7 in Aït-Sahalia et al. (2001), allowing for dependent observed data including macroeconomic or financial time-series data. Assumptions 1-2 and 1-4 generalize Assumption 2 of Aït-Sahalia et al. (2001) to encompass the threshold models. Moreover, Assumptions 1-4 and 1-5 restrict the behaviors of the conditional moments and conditional mean functions across distinct thresholds. Assumption 2.1 states the standard restrictions on the higher-order kernel functions, which are devices used to reduce bias ( cf. Li and Racine, 2007). Assumption 2.2, however, implies that there is no need to use a higher-order kernel $(r>2)$ unless the dimensionality of the covariate is greater than or equal to 3. Assumption 3 imposes the joint restrictions on the bandwidth sequence $h$ , order of the kernel $r$ , dimensionality of the covariate $p$ , and sample size $n$ . In particular, when $p=1$ and $r=2$ , the restriction, $2p<\delta<2r+p/2$ which is also used by Aït-Sahalia et al. (2001), leads to $2<\delta<4.5$ . In this study, when conducting Monte Carlo simulations, we impose $\delta=4.25$ , which suffices the nonparametric estimator valid asymptotic properties.

2.2 Asymptotic Properties of the Estimators under Known Thresholds

Assuming that the number of thresholds $s$ and corresponding threshold values $\gamma_{j},j=1,\ldots,s+1$ are known already, the consistency and asymptotic normality of $\hat{f}_{\gamma_{j}}(\mathbf{x})$ are provided in Theorem 1 and the asymptotic properties of $\hat{m}_{\gamma_{j}}(\mathbf{x})$ are stated in Theorem 2.

Theorem 1.

Suppose that the assumptions in Assumptions 1, 2, and 3-1 hold. The following results are established.

a). The almost sure convergence rate of $\hat{f}_{\gamma_{j}}(\mathbf{x})$ ,

[TABLE]

b). The asymptotic normality of $\hat{f}_{\gamma_{j}}(\mathbf{x})$ ,

[TABLE]

where

[TABLE]

When the estimation is carried out at a single point $x$ , we have the convergence rate $O_{p}(h^{r}+1/(nh^{p})^{1/2})$ . In empirical applications, multiple $\mathbf{x}$ often appear and then the estimator has a slower uniform convergence rate $O_{p}(h^{r}+(\ln(n))^{1/2}/(nh^{p})^{1/2})$ . Hence, from part $b)$ , the kernel-smoothing density estimation is biased. Given that a Gaussian product kernel is being used, we already know that $C_{1}=1$ and $C_{2}=1/(2\sqrt{\pi})^{p}$ according to Aït-Sahalia et al. (2001). Moreover, given that the number of thresholds $s$ and corresponding threshold values $\gamma_{j},j=1,\ldots,s+1$ are known, the consistency and asymptotic normality of $\hat{m}_{\gamma_{j}}(\mathbf{x})$ are provided as follows.

Theorem 2.

Suppose that the assumptions in Assumptions 1, 2 and 3-1 hold. The following results are derived.

a) The almost sure convergence rate of $\hat{m}_{\gamma_{j}}(\mathbf{x})$ ,

[TABLE]

b) The asymptotic normality of $\hat{m}_{\gamma_{j}}(\mathbf{x})$ ,

[TABLE]

where $AB(\mathbf{x})$ denotes the asymptotic bias,

[TABLE]

$m^{(1)}_{\gamma_{j},l}(\mathbf{x})=\frac{\partial m_{\gamma_{j}}(\mathbf{x})}{\partial x_{l}}$ * and $m^{(2)}_{\gamma_{j},l}(\mathbf{x})=\frac{\partial^{2}m_{\gamma_{j}}(\mathbf{x})}{\partial x_{l}^{2}}$ are the first- and second-order derivatives of the * $j$ *th regime’s conditional mean with respect to the * $l$ th explanatory variable, respectively. $\square$

It is now clear that the sample estimator $\hat{m}_{\gamma_{j}}(\mathbf{x})$ is also asymptotically biased. However, this asymptotic bias could be reduced by using higher-order kernels. Notice that the convergence rates and asymptotic results of $\hat{f}_{\gamma_{j}}(\mathbf{x})$ and $\hat{m}_{\gamma_{j}}(\mathbf{x})$ are not affected by $s$ , the number of thresholds. In finite samples, the number of thresholds does affect the nonparametric estimation. However, at the limit, the convergence rate does not depend on $s$ . Our results are therefore similar to those presented by Li and Racine (2007).

2.3 Optimal Bandwidth Selector

In nonparametric regressions, bandwidth plays a crucial role in the estimation. Different bandwidth selection rules have been suggested in the literature. Among the selectors, the optimal bandwidth selector is the most comprehensively studied and is obtained by minimizing the mean integrated squared error (MISE). That is, for a model with $s$ thresholds, the corresponding MISE is defined as

[TABLE]

and then the optimal bandwidth selector is obtained from

[TABLE]

The weighting function $w(\mathbf{x})$ is an indicator function selecting a particular $x$ -region of interest, and this depends generally on empirical studies. Since the threshold variable $q$ does not affect the convergence rate of the proposed estimator, we construct the weighting function without including the threshold variable. The convergence rate of $h_{opt}$ is derived and summarized in the following theorem.

Theorem 3.

Under Assumptions 1, 2, and 3, the convergence rate of the optimal bandwidth selector is $h_{opt}=O(n^{-\frac{1}{\delta}})$ in which $\delta=p+2r$ . $\square$

This result shows that the convergence rate of the optimal bandwidth selector depends on the number of covariates $p$ and order of continuous differentiability of the kernel function, but that the convergence rate is not affected by the number of thresholds. In other words, the additional thresholds do not worsen the curse-of-dimensionality problem.

3 Determining the Number of Thresholds

The number of thresholds and corresponding threshold values are typically unknown in practice. In this section, we thus present a procedure for determining the unknown number of thresholds and estimating the threshold values. In linear regressions with thresholds, the number of thresholds is commonly determined by carrying out a sequential significance test (see Hansen, 1997). This sequential test is conducted by comparing the estimated sum of the squared errors from a model with $s$ thresholds (under the null hypothesis) with that from a model with $s+1$ thresholds (under the alternative) sequentially. The number of thresholds is determined as $s$ when the null of $s-1$ thresholds versus the alternative of $s$ thresholds is rejected, whereas the null of $s$ thresholds versus the alternative of $s+1$ thresholds is not rejected. Similarly, we determine the number of thresholds in nonparametric regressions based on sequential tests in this study. Instead of comparing the estimated error sum of squares from the linear regressions, however, we use the significance test suggested by Aït-Sahalia et al. (2001) for the nonparametric regressions as the basis in the sequential tests. The test statistic for the null of $s+1$ thresholds to $s$ thresholds is constructed and its asymptotic distribution is established as follows.

The test of Aït-Sahalia et al. (2001) is constructed to test the significance of a subset of covariates in a nonparametric regression. The intuition behind the test is to check the difference between the nonparametric regression estimates of unconstrained and constrained conditional means. That is, the null of the significance test is written as

[TABLE]

where $\mathbf{W}$ represents the $p$ -dimensional explanatory variables, $\mathbf{V}$ is the $q$ -dimensional explanatory variables under testing, $m(\mathbf{w},\mathbf{v})$ and $m(\mathbf{w})$ denote the conditional means under the alternative and null hypotheses, and $f(\mathbf{w},\mathbf{v})$ and $f(\mathbf{w})$ are the joint probability density functions of $(\mathbf{w},\mathbf{v})$ and $\mathbf{w}$ , respectively.

To test the null of $s$ thresholds versus the alternative of $s+1$ , this test can be modified by taking $\mathbf{W}$ as the $p\times(s+1)$ independent variables in the regression with $s$ thresholds and $\mathbf{V}$ as the extra $p$ independent variables in the regression with $s+1$ thresholds. The significance of $\mathbf{V}$ implies that the regression with $s+1$ thresholds must be considered. However, the regression remains with $s$ thresholds if $\mathbf{V}$ is not significant. The details are discussed as follows. First, we construct the test for detecting whether an extra threshold (known at value, $\tau_{j}$ ) exists in the $j$ th regime. Second, since the threshold value $\tau_{j}$ is unknown in general, the test is extended to test whether an extra unknown threshold exists in the $j$ th regime.

3.1 Testing for the Existence of an Extra Threshold

Given a regression with $s$ thresholds expressed as (2), a new threshold $\tau_{j}$ is suspected to exist in the $j$ th regime $[\gamma_{j-1},\gamma_{j})$ . Then, the conditional mean for the regime $[\gamma_{j-1},\gamma_{j})$ is split into two parts: $m_{\gamma_{j-1},\tau_{j}}(\mathbf{X}_{i})I_{\gamma_{j-1},\tau_{j}}(Q_{i})$ in the regime $[\gamma_{j-1},\tau_{j})$ and $m_{\tau_{j},\gamma_{j}}(\mathbf{X}_{i})I_{\tau_{j},\gamma_{j}}(Q_{i})$ in the regime $[\tau_{j},\gamma_{j})$ , where

[TABLE]

and $m_{\gamma_{j-1},\tau_{j}}(\mathbf{x})$ is defined as

[TABLE]

and $m_{\tau_{j},\gamma_{j}}(\mathbf{x})$ is defined similarly to $m_{\gamma_{j-1},\tau_{j}}(\mathbf{x})$ .

Denote $E(Y|\mathbf{X},Q;\gamma_{1},\ldots,\gamma_{s})$ as the conditional mean with $s$ thresholds under the null and $E(Y|\mathbf{X},Q;\gamma_{1},\ldots,\gamma_{j-1},\tau_{j},\gamma_{j},\ldots,\gamma_{s})$ as the conditional mean function with $s+1$ thresholds under the alternative. Then, the null hypothesis for testing whether an extra threshold exists in the regime $[\gamma_{j-1},\gamma_{j})$ can be written as

[TABLE]

The sample statistic analogous to the test $\Gamma(\tau_{j})$ in Aït-Sahalia et al. (2001) is constructed as

[TABLE]

where $\hat{m}_{\gamma_{j}}(\mathbf{x})$ , $\hat{m}_{\gamma_{j-1},\tau_{j}}(\mathbf{x})$ , and $\hat{m}_{\tau_{j},\gamma_{j}}(\mathbf{x})$ are the sample estimates of $m_{\gamma_{j}}(\mathbf{x})$ , $m_{\gamma_{j-1},\tau_{j}}(\mathbf{x})$ , and $m_{\tau_{j},\gamma_{j}}(\mathbf{x})$ , respectively, and $a(\mathbf{X}_{i})$ is a weighting function. Specifically,

[TABLE]

The choice of $\mathbf{C}$ is application-dependent. For example, in an empirical analysis of options prices, $a(\mathbf{X})$ can be set to exclude those in-the-money options with price biases. Similarly, it can be set by using prior information to tackle boundary effects so that the density is bounded away from zero. Since $\tilde{\Gamma}(\tau_{j})$ is the weighted sum of the squares of the differences from $\hat{m}_{\gamma_{j}}(\mathbf{x})$ to $\hat{m}_{\gamma_{j-1},\tau_{j}}(\mathbf{x})$ and to $\hat{m}_{\tau_{j},\gamma_{j}}(\mathbf{x})$ , the null hypothesis, $\Gamma(\tau_{j})=0$ , is not rejected when $\tilde{\Gamma}(\tau_{j})$ is insufficiently large and is rejected when $\tilde{\Gamma}(\tau_{j})$ is sufficiently large. Therefore, this inference is a right-tailed test. The asymptotic distribution of $\tilde{\Gamma}(\tau_{j})$ is constructed as follows.

Theorem 4.

Under the null hypothesis and according to Assumptions 1, 2, and 3, the asymptotic normality of the statistic $\tilde{\Gamma}(\tau_{j})$ is represented as

[TABLE]

where $\xi(\tau_{j})$ and $\sigma^{2}(\tau_{j})$ denote the bias and variance terms, respectively, and where the bias term is

[TABLE]

with

[TABLE]

$C_{2}$ * was defined in Theorem 1, and the variance term is*

[TABLE]

with

[TABLE]

where $\sigma^{2}_{\gamma_{j}}(\mathbf{x})$ , $\sigma^{2}_{\gamma_{j-1},\tau_{j}}(\mathbf{x})$ , and $\sigma^{2}_{\tau_{j},\gamma_{j}}(\mathbf{x})$ are

[TABLE]

and

[TABLE]

Note that Aït-Sahalia et al. (2001) also show that $C_{3}=1/(2\sqrt{2\pi})^{p}$ when the Gaussian product kernel is used. Given the result in Theorem 4, we denote

[TABLE]

and then the test statistic for the null of having an extra threshold $\tau_{j}$ in the $j$ th regime can be considered to be

[TABLE]

where $\hat{\sigma}^{2}$ and $\hat{\xi}$ are the consistent estimators for $\sigma^{2}$ and $\xi$ , respectively. The limiting distribution of $\hat{\delta}(\tau_{j})$ is $N(0,1)$ . The power property of $\hat{\delta}(\tau_{j})$ is investigated in Section 3.4; Consequently it is a consistent test. We describe the consistent estimation of $\sigma^{2}$ and $\xi$ in the following subsections.

3.2 Testing for an Extra Unknown Threshold

In practice, $\tau_{j}$ is unknown a priori and there are, in principle, infinite many of $\tau_{j}$ s in the regime $[\gamma_{j-1},\gamma_{j})$ . To make the test implementable, instead of infinite many of $\tau_{j}$ s, we only consider the $m$ candidate threshold values within the regime $[\gamma_{j-1},\gamma_{j})$ , i.e., $\gamma_{j-1}<\tau_{j,1}<\tau_{j,2}<\ldots<\tau_{j,m}<\gamma_{j}$ , where $\tau_{j,1}-\gamma_{j-1}=\tau_{j,2}-\tau_{j,1}=\cdots=\gamma_{j}-\tau_{j,m}=(\gamma_{j}-\gamma_{j-1})/m$ . Given the suspected $m$ pseudo thresholds, $\tau_{j,1},\tau_{j,2},\ldots,\tau_{j,m}$ , the null of an extra unknown threshold can be written as

[TABLE]

Given the sample counterpart $\tilde{\Gamma}(\tau_{j,i})$ of $\Gamma(\tau_{j,i}),i=1,\ldots,m,$ as defined in (9), the following theorem reports the joint asymptotic distribution of the $m$ statistics.

Theorem 5.

Given that the assumptions in Assumptions 1, 2, and 3 hold, $\mbox{E}(e^{2}_{i}|\mathbf{X}_{i}=\mathbf{x},Q_{i}=q)=\sigma^{2}(\mathbf{x},q)$ , and under the null,

[TABLE]

where

[TABLE]

and $\Sigma$ is the variance-covariance matrix of $\delta(\tau_{j,1}),\ldots,\delta(\tau_{j,m})$ . The $(l,k)$ -element in the variance-covariance matrix $\Sigma$ , assuming $\tau_{j,l}<\tau_{j,k}$ , is

[TABLE]

where $\varphi(\tau_{j,l},\tau_{j,k})$ is defined in the Appendix because of its complex form. $\square$

Theorem 5 is applicable to nonparametric regressions with heteroskedastic errors whose variances depend on the values of $\mathbf{X}_{i}$ and $Q_{i}$ , i.e, $\mbox{E}(e^{2}_{i}|\mathbf{X}_{i}=\mathbf{x},Q_{i}=q)=\sigma^{2}(\mathbf{x},q)$ .111 For the two restricted cases with heteroskedastic errors whose variances depend on the values of $\mathbf{X}_{i}$ but not on those of $Q_{i}$ , i.e., $\mbox{E}(e^{2}_{i}|\mathbf{X}_{i}=\mathbf{x},Q_{i}=q)^{2}=\sigma^{2}(\mathbf{x})$ , when $\mathbf{X}$ and $Q$ are either dependent or independent, the joint asymptotic distribution of the $m$ statistics is also derived but not provided in this paper. The detailed results and proofs of the corresponding asymptotic distributions are available from the authors upon request. By replacing $\sigma^{2}$ , $\xi$ , and $\Sigma$ in Theorem 5 with consistent estimates, namely $\hat{\sigma}^{2}$ , $\hat{\xi}$ , and $\hat{\Sigma}$ , respectively, we have

[TABLE]

where

[TABLE]

3.3 Estimation of the Nuisance Parameters

Given the asymptotic normality of the test statistic $\tilde{\Gamma}(\tau_{j})$ , the nuisance parameters must be estimated consistently. First, the parameter $\sigma^{2}_{\gamma_{j}}(\mathbf{x})$ can be estimated by using the Nadaraya-Watson estimator as follows:

[TABLE]

Thus, $\sigma^{2}$ , $\xi$ , and $\Sigma$ can be estimated as

[TABLE]

and

[TABLE]

Further, the $(i,j)$ th elements of $\Sigma$ can be estimated as

[TABLE]

where the terms $\hat{c}_{1}$ to $\hat{c}_{9}$ are

[TABLE]

Given Lemma 6 , Theorems 1 and 2, and Assumptions 1, 2 and 3, we have the following results as in Aït-Sahalia et al. (2001):

[TABLE]

and

[TABLE]

That is, $\hat{\xi}_{1}(\tau_{j,k})$ , $\hat{\xi}_{2}(\tau_{j,k})$ , $\hat{\sigma}^{2}_{1}(\tau_{j,k})$ , and $\hat{\sigma}^{2}_{2}(\tau_{j,k})$ are the consistent estimators of $\xi_{1}(\tau_{j,k})$ , $\xi_{2}(\tau_{j,k})$ , $\sigma^{2}_{1}(\tau_{j,k})$ , and $\sigma^{2}_{2}(\tau_{j,k})$ , respectively. For $C_{2}$ and $C_{3}$ , Aït-Sahalia et al. (2001) show that

[TABLE]

In light of the results in (26), the following test statistics are suggested to test the null of no extra unknown threshold existing in the regime $[\gamma_{j-1},\gamma_{j})$ :

[TABLE]

Furthermore, we know that $\hat{\delta}^{*}(\tau_{j,i})$ converge to the standard normal distribution. Therefore, the distribution in the limit of $Z_{\gamma_{j}}$ is also standard normally distributed, i.e.,

[TABLE]

3.4 Local Alternative Power

In this subsection, we study the consistency of the test. We then examine its power, that is, the probability of rejecting a false hypothesis against the sequences of alternatives that approach the null as $n\rightarrow\infty$ . Given an extra threshold existing in $[\gamma_{j-1},\gamma_{j})$ and being neglected,

[TABLE]

for $q\in[\gamma_{j-1},\gamma_{j})$ . Suppose an extra threshold does exist in $[\gamma_{j-1},\gamma_{j})$ under the alternative and denote the sequence of densities as $f^{[n]}_{\gamma_{j}}$ , $f^{[n]}_{\gamma_{j-1},\tau_{j}}$ and $f^{[n]}_{\tau_{j},\gamma_{j}}$ . The superscript $[n]$ is specified to show that these densities are dependent on $n$ since the value of the extra threshold is unknown. The local alternatives can be specified as

[TABLE]

where

[TABLE]

and $\lambda_{\tau^{*},\tau_{j}}(\mathbf{x},q)$ satisfies

[TABLE]

and

[TABLE]

It is clear that the alternative $H_{1n}$ converges to the null $H_{0}$ at speed $n^{-1/2}h^{-p/4}$ (i.e., $\epsilon_{n}=n^{-1/2}h^{-p/4}$ ).

Theorem 6.

Under Assumptions 1, 2, and 3, the asymptotic power of the test is

[TABLE]

where $\Phi(z_{\alpha})=1-\alpha$ with $\Phi(\cdot)$ , the CDF function of a standard normal random variable. $\square$

3.5 Identifying the Number of Thresholds

The test statistic, the average norm $Z_{\gamma_{j}}$ , is suggested to check whether an extra threshold exists in the regime $[\gamma_{j-1},\gamma_{j})$ given that the $s$ threshold values $\gamma_{1},\ldots,\gamma_{s}$ are already known. Logically, the test can be applied to check for an extra threshold existing in the regime $[\gamma_{j-1},\gamma_{j})$ for $j=1,\ldots,s$ jointly. This thus ends up being the test for whether there is an extra threshold in a given $s$ threshold regression. Accordingly, we construct, in what follows, the test for the null of $s$ thresholds against the alternative of $s+1$ thresholds.

Since the indicator functions are independent, i.e., $I_{\gamma_{i}}(Q_{i})\times I_{\gamma_{j}}(Q_{i})=0,i\neq j$ , the covariance of $\delta(\tau_{i})$ and $\delta(\tau_{j})$ for $i\neq j$ is zero. That is

[TABLE]

This fact implies that $Z_{\gamma_{j}}$ and $Z_{\gamma_{l}}$ ( $j\neq l$ ) are asymptotically independent. The test statistic for the null $s$ thresholds against $s+1$ thresholds is constructed as characterized in the following theorem.

Theorem 7.

Under the same assumptions as for Theorem 5, the test statistic for the null $s$ thresholds against $s+1$ thresholds is constructed as

[TABLE]

with $\lim_{n\to\infty}P(F_{n}(s+1|s)\leq x)=\Phi^{s+1}(x)$ , where $\Phi(x)$ is the CDF of a standard normal distribution and $Z_{\gamma_{j}}$ is defined in equation (12). $\square$

Table 1 presents he critical values of the test statistic $F_{n}(s+1|s)$ for $s+1=1,2,3,4,5$ at 1%, 5% and 10%.

Given the test statistic for testing $s$ thresholds against $s+1$ thresholds in Theorem 7, the number of thresholds can be determined by conducting these tests sequentially for $s=0,1,\ldots$ and so on. The number of thresholds is determined by sequential inferences until the not rejection result is obtained. In other words, the number of thresholds is $s$ when the null of $s$ thresholds against $s+1$ thresholds is not rejected. When the number of thresholds is determined, we estimate the corresponding threshold values by using the methods discussed in the next section.

4 Statistical Properties of the Threshold Estimators

In the preceding discussions on testing an extra unknown threshold in a certain regime and testing the null of $s$ thresholds against $s+1$ thresholds, the threshold values under the null are assumed to be known already. In applied research, the threshold values are unknown and need to be estimated by using a valid procedure. In the framework of linear regressions, Bai (1997) and Bai and Perron (1998) determine the number of structural changes by using a sequential test and estimate the breakpoints by looking up the sums of the squared errors at which the minimization is obtained. Hansen (1999) discusses the determination of the number of thresholds and estimation of threshold values in linear regressions by using similar procedures. We thus extend these procedures to the framework of nonparametric regressions.

4.1 Added Assumptions

To derive the statistical properties of the threshold value estimators, we need the following assumptions.

Assumption 4.

4-1.

$f_{q}(q)$ , $\mbox{E}(c^{2}_{l.k}(\mathbf{X})|q)$ , and $\mbox{E}(c^{2}_{l.k}(\mathbf{X})e^{2}|q)$ exist and are continuous at $q=\gamma_{1},\ldots,\gamma_{s}$ , where $c_{l.k}(\mathbf{X}_{i}):=m_{\gamma_{l}}(\mathbf{X}_{i})-m_{\gamma_{k}}(\mathbf{X}_{i})$ . 2. 4-2.

$\max_{l,k\in[1,\ldots,s]],l\neq k}\mbox{E}|c_{l.k}(\mathbf{X}_{i})|^{4}<\infty$ , $\mbox{E}|c_{l.k}(\mathbf{X}_{i})e_{i}|^{4}<\infty$ . 3. 4-3.

$\forall\gamma\in R$ , $\mbox{E}(|c^{4}_{l.k}(\mathbf{X}_{i})e^{4}_{i}||Q_{i}=\gamma)<D$ , $\mbox{E}(|c^{4}_{l.k}(\mathbf{X}_{i})||Q_{i}=\gamma)<D$ for some $D\leq\infty$ , and $f_{q}(\gamma)\leq\bar{f}\leq\infty$ . 4. 4-4.

$\delta_{n,l,k}(\mathbf{X}_{i})=n^{-\alpha}c_{l,k}^{*}(\mathbf{X}_{i})$ , $\int|c(\mathbf{x}_{i})|d\mathbf{x}_{i}\neq 0$ and $0<\alpha<1/2$ . 5. 4-5.

$nh^{2/p+2r}\to 0$ and $[(\ln(n))^{1/2}n^{\alpha}]/[n^{1/2}h^{p/2}]\to 0$ , where $0<\alpha<1/2$ .

Assumptions 4-1, 4-2, and 4-3 are standard in proving the consistency of the threshold estimators. Assumptions 4-4 and 4-5 relate to a condition called the small effect, $\delta_{n,l,k}(\cdot)$ , which is needed when we derive the asymptotic property of the threshold value estimator; see the proofs of Lemma 7 and Theorem 9. The small effect can approach zero when the sample size is sufficiently large; therefore, it depends on $n$ . $c_{l,k}^{*}(\mathbf{X}_{i})$ is the remainder of the difference between $m_{\gamma_{l}}(\mathbf{X}_{i})$ and $m_{\gamma_{k}}(\mathbf{X}_{i})$ when we extract the effect of the sample size, $n^{-\alpha}$ , from $c_{l,k}(\mathbf{X}_{i})$ .

4.2 Asymptotic Properties of the Threshold Value Estimators

Given that the number of thresholds $s$ is known, the estimator of the threshold values can be defined in a manner similar to that in Proposition 5 of Bai and Perron (1998):

[TABLE]

Clearly, $\hat{\gamma}_{1},\ldots\hat{\gamma}_{s}$ are determined simultaneously by global minimization. In practice, the estimation is implemented by an algorithm based on the principle of dynamic programming. Under Assumptions 1, 2, 3, and 4, the following theorem establishes the consistency of $\hat{\gamma}_{j},j=1,\ldots,s$ .

Theorem 8.

For $j=1,\ldots,s$ ,

a)

[TABLE] 2. b)

[TABLE]

The convergency rate of $\hat{\gamma}_{j}$ is $n$ , which is a common result in the literature on structural changes and threshold models within the framework of linear regressions and linear quantile regressions (cf. Chen, 2008). The limiting distribution of the threshold value estimator is provided by Chan (1998) for linear models. On the contrary, Hansen (2000) and Bai and Perron (2003) introduce the existence of the small effect to obtain the limiting distribution without the nuisance parameters of the threshold value estimation. That is, denote

[TABLE]

Under the assumption of $\delta_{n,l,k}(\mathbf{X}_{i})\to 0$ , which is called the small effect, we then obtain the asymptotic property of $\hat{\gamma}_{j}$ :

Theorem 9.

[TABLE]

where

[TABLE]

where

[TABLE]

and $B_{1,j}(\cdot)$ and $B_{2,j}(\cdot)$ are two independent Brownian motions. $\square$

Note that the convergence rate of $\hat{\gamma}_{j}$ under the existence of the small effect is slower than the rate in the case in which no small effect is assumed. The CDF of $Q_{j}$ can be obtained from Bhattacharya and Brockwell (1976), i.e., for $a\geq 0$ ,

[TABLE]

and for $a\leq 0$ , $P(Q_{j}\leq x)=1-P(Q_{j}\leq-x)$ , where $\Phi(x)$ is the CDF of a standard normal random variable.

4.3 Sequential Method

Instead of using a global minimization algorithm in the threshold value estimations, the sequential method can be adopted. Bai (1997) proposes the sequential method for estimating the change points in a linear regression with multiple structural changes and provides the proof of the consistency of his estimator without knowing the number of breaks. Bai and Perron (1998) also suggest using the sequential method to estimate the change points in linear regressions, while Hansen (1998) applies the sequential method to estimate the threshold values for nondynamic panel threshold models. Following the literature, we thus use the sequential method to estimate the threshold values in the nonparametric regressions. Without loss of generality, a nonparametric regression with three thresholds is considered. The model under consideration is, for $s=3$ ,

[TABLE]

The true threshold values implied by this model are $\gamma_{1},\gamma_{2}$ , and $\gamma_{3}$ , while $\gamma_{0}$ and $\gamma_{4}$ are the lower and upper bounds of the threshold values. However, a nonparametric regression is mis-specified when a model with one threshold is estimated as

[TABLE]

where $\hat{m}_{\gamma}(\mathbf{X}_{i})$ and $\hat{m}_{\gamma}^{*}(\mathbf{X}_{i})$ denote the kernel estimations from the sample observations $Q_{i}\in(-\infty,\gamma]$ and $Q_{i}\in[\gamma,\infty)$ , respectively. The indicator function $I_{\gamma}(Q_{i})=1$ for $Q_{i}\in(-\infty,\gamma]$ and 0 otherwise.

Denote $SSR(\gamma)$ as the sum of the squared residuals from the nonparametric regression with the threshold value $\gamma$ . That is,

[TABLE]

Theorem 10.

Given a threshold value specified at $\gamma$ in a mis-specified nonparametric regression with one threshold, the model mis-specification error is

[TABLE]

where $b_{j}(\gamma)$ and $I_{\gamma_{j}}(\gamma)$ for $j=1,\ldots,4$ are defined in the Appendix. $\square$

Given the three true threshold values $\gamma_{1}$ , $\gamma_{2}$ , and $\gamma_{3}$ , the threshold value $\gamma$ of a mis-specified nonparametric regression with one threshold may be in $[\gamma_{0},\gamma_{1})$ , in $(\gamma_{1},\gamma_{2})$ , in $(\gamma_{2},\gamma_{3})$ , or in $(\gamma_{3},\gamma_{4}]$ . The model mis-specification error of the whole sample is $b_{1}(\gamma)$ , $b_{2}(\gamma)$ , $b_{3}(\gamma)$ , or $b_{4}(\gamma)$ if the threshold value is mis-specified at the regime $[\gamma_{0},\gamma_{1})$ , $(\gamma_{1},\gamma_{2})$ , $(\gamma_{2},\gamma_{3})$ , or $(\gamma_{3},\gamma_{4}]$ , respectively. In the Appendix, we describe the foregoing results in detail.

Theorem 11.

Let $S(\gamma_{1})=\min(S(\gamma_{1}),S(\gamma_{2}),S(\gamma_{3}))$ . $S(\gamma_{1})$ is the smallest model mis-specification error among all $\gamma\in[\gamma_{0},\gamma_{4}]$ . The exact expression of $S(\cdot)$ can be found in the Appendix. $\square$

$S(\gamma_{1}),S(\gamma_{2})$ , and $S(\gamma_{3})$ are three smallest model mis-specification errors among all $\gamma\in[\gamma_{0},\gamma_{4}]$ . Moreover, since $S(\gamma)$ is the limit of $SSR(\gamma)$ in probability and, without loss of generality, $\min(S(\gamma_{1}),S(\gamma_{2}),S(\gamma_{3}))=S(\gamma_{1})$ is assumed, we have the following theorem to prove $S(\gamma_{1})$ is global minimization. That is, Theorem 12 is sufficient to justify the sequential procedures discussed.

Theorem 12.

Assume that the true model is a nonparametric regression with three threshold values, namely $\gamma_{1}$ , $\gamma_{2}$ , and $\gamma_{3}$ , and that a nonparametric regression with one threshold is mis-specified and estimated via

[TABLE]

We then have

a).

If $S(\gamma_{1})=\min(S(\gamma_{1}),S(\gamma_{2}),S(\gamma_{3}))$ , $S(\gamma_{1})$ is the smallest model mis-specification error among all $\gamma\in[\gamma_{0},\gamma_{4}]$ 2. b).

$SSR(\hat{\gamma})\to S(\gamma_{1})$ . 3. c).

$\hat{\gamma}$ * will, with probability one, converge to $\gamma_{1}$ . $\square$ *

According to Theorem 12, even if the nonparametric regression is mis-specified and a threshold value is mis-estimated at which the sum of the squared errors is smallest, the mis-estimated threshold value converges to the true threshold value at which the model mis-specification error is the smallest. The result of Theorem 12 is thus similar to those in the study by Bai and Perron (1998) for the estimation of the change points in a linear regression with multiple structural changes. To the best of our knowledge, this is the first theorem that ensures the consistency of the estimators obtained from using a sequential method in nonparametric regressions.

Note that the assumption $\min(S(\gamma_{1}),S(\gamma_{2}),S(\gamma_{3}))=S(\gamma_{1})$ indicates that the threshold value $\gamma_{1}$ has the largest influence on the regression.Theorem 12 can be extended to a mis-specified regression model with two threshold values, and then the two estimated threshold values will be consistent with the two true threshold values that have a larger impact on the regression. Based on Theorem 12, the determination of the number of thresholds and estimation of the threshold values can be obtained by using the following sequential procedure.

Implement the test for the null of $s=0$ against $s=1$ . That is, run the test to check whether an extra threshold exists in $(\gamma_{\min},\gamma_{\max})$ . If the null is not rejected, it is inferred that the regression has no threshold. If the null is rejected, move onto the next step. 2. 2.

Specify $s=1$ and estimate the threshold value as $\hat{\gamma}_{1}$ . Given $\hat{\gamma}_{1}$ , carry out the test for the null of $s=1$ against $s=2$ . That is, run the test to check whether an extra threshold exists in regimes $(\gamma_{\min},\hat{\gamma}_{1}]$ and $(\hat{\gamma}_{1},\gamma_{\max})$ . If the null is not rejected, it is inferred that the regression has one threshold. If the null is rejected, move onto the next step. 3. 3.

Specify $s=2$ and estimate the extra threshold value from regimes $(\gamma_{\min},\hat{\gamma}_{1}]$ and $(\hat{\gamma}_{1},\gamma_{\max})$ as $\hat{\gamma}_{2}$ . Pick up the estimation of the threshold values, $\hat{\gamma}_{2}$ , which has a smaller sum of squared errors. Given $\hat{\gamma}_{1}$ and $\hat{\gamma}_{2}$ , carry out the test for the null of $s=2$ against $s=3$ . That is, run the test to check whether an extra threshold exists in regimes $(\gamma_{\min},\hat{\gamma}_{1}]$ , $(\hat{\gamma}_{1},\hat{\gamma}_{2}]$ , and $(\hat{\gamma}_{2},\gamma_{\max})$ if $\hat{\gamma}_{2}>\hat{\gamma}_{1}$ . If the null is not rejected, it is inferred that the regression has two thresholds. If the null is rejected, repeat the above test until the null of $s$ against $s+1$ thresholds is not rejected.

When the procedure is conducted to the end such that the null of $s$ thresholds against $s+1$ thresholds is not rejected, we then pin down a nonparametric regression with $s$ thresholds. Along with this procedure, the estimates of the $s$ threshold values, $\hat{\gamma}_{1},\hat{\gamma}_{2},\ldots,\hat{\gamma}_{s}$ , are obtained as a byproduct. Following Theorem 12, the consistency of $\hat{\gamma}_{1},\hat{\gamma}_{2},\ldots,\hat{\gamma}_{s}$ , is obtained consequently.

As mentioned in Proposition 8 of Bai and Perron (1998), the drawback of the previously described sequential method is that the determined number of thresholds is larger than the true number of thresholds with a nonzero probability value. Therefore, Bai and Perron (1998) recommend applying the sequential method with a certain Type I error that converges to zero at a slower rate with the sample size. By doing so, the determined number of thresholds converges to the true number of thresholds.

5 Monte Carlo Studies

In this section, Monte Carlo studies are conducted to evaluate the performance of the proposed test statistic, $F_{n}(s+1|s)$ . We also conduct simulations to assess the finite sample performance of the sequential method for estimating the threshold values.

5.1 Empirical Performance of the Test Statistic

Monte Carlo simulations are designed to evaluate the empirical size and power performances of the tests to identify the number of thresholds. Our experimental design is mainly based on the data-generating process (DGP) considered in Aït-Sahalia et al. (2001). We consider the null of no threshold against the alternative with one threshold. The DGP under the null is specified as

[TABLE]

In this DGP, the random variable $X$ is dependent on the threshold variable $Q$ and the heteroskedasticity of the regression depends on $X$ and $Q$ . By using a univariate normal kernel function, we compute the bandwidth as $h=c\cdot\sigma\cdot n^{-1/\delta}=n^{-1/\delta}$ , where $\delta=4.25$ (cf. Aït-Sahalia et al., 2001, p.383), $c=1$ , and $\sigma$ is set to one in our simulation. We also conduct robustness checks on the bandwidth selection. Since $s=0$ under the null, the critical values of the test statistic $F_{n}(s+1|s)$ in Theorem 7 are 1.282, 1.645, and 2.326 for Type I errors at 1%, 5%, and 10%, respectively.

We conduct simulations with sample sizes of 500, 1000 and 2000. Throughout our simulations, the numbers of replications and partitions $m$ are set to be 1000 and 7, respectively. Table 2 presents the empirical sizes of $F_{n}(s+1|s)$ at 1%, 5%, and 10%, showing that the proposed test performs well with decent empirical sizes.

Table 3 shows the corresponding Monte Carlo results with the robustness checks on the choice of bandwidth. The proposed test copes well with decent sizes across the distinct bandwidth values.

5.2 Finite-sample Performance of the Sequential Method

To assess the accuracy of the sequential method for estimating the threshold values, we consider the following DGP in the Monte Carlo studies, which are similar to those in Aït-Sahalia et al. (2001, p.383):

[TABLE]

Let $\hat{\gamma}_{1,i}$ denote the threshold value estimate in the first-round identification from the $i$ th replication of the DGP. Then, the mean, standard error, and MSE (mean square error) from all the $nr$ replications are computed by

[TABLE]

Given $n=500,1000,3000$ and 1000 replications, Table 4 shows the Monte Carlo results. We can draw the following conclusions from the simulation results. The standard error and MSE of the estimated threshold values decrease as the sample size increases. The sequential method consistently estimates the unknown threshold values. In particular, the mean and standard error of the first estimated threshold values are 0.5029813 and 0.0107737, respectively. The mean value is close to $\gamma_{3}=0.5$ . For the second estimated threshold values, the mean is 0.152506, which is close to $\gamma_{2}=0.15$ . The mean of the third estimated threshold values is -0.6966892, which is close to $\gamma_{1}=-0.7$ . These simulated results indicate the accuracy of the sequential method for estimating the threshold values. Given the good performance of the simulations, and based on Theorems 8 and 9, the threshold value estimators are super-consistent, as we see in Hansen (2000).

6 An Empirical Application: the 401(K) Retirement Savings Plan with Income Thresholds

Examining the effects of 401(k) plans on savings is an issue of long-standing empirical interest (see Chernozhukov and Hansen (2004) and the references cited therein). Intuitively, because different income groups face distinct resource constraints, income thresholds should play an important role in the analysis of individual savings for retirement. Chernozhukov and Hansen (2013) study the effect of 401(k) eligibility on total wealth by using high-dimensional methods that allow for flexible functional forms. By using a sample of 9915, they generate 10,763 technical variables through a spline basis and polynomial basis and then select a few important variables out of the technical variables by using a LASSO-based double selection procedure. The selected few important variables include $\max(0,\ income-0.33)$ , where the $income$ variable is normalized on the $[0,1]$ interval. Their result suggests that the income threshold exists in the 401(k) study. In the literature, however, no test procedures have thus far been implemented to investigate the relevant income threshold values in 401(k) applications. In this section, we use our testing procedure to show that income thresholds indeed exist in 401(k) applications, and confirm that this finding is robust to functional form specifications.

To illustrate the testing procedure proposed in the preceding sections, we consider the estimation and inference of the thresholds associated with the effect of 401(k) eligibility on total wealth. 401(k) eligibility, the variable of interest, is an indicator of being eligible to enroll in a 401(k) plan (i.e., whether individual $i$ is working for a firm that offers access to a 401(k) plan). Poterba et al. (1994a, 1994b) and Chernozhukov et al. (2016) argue that 401(k) eligibility may be taken as exogenous conditional on income. Following Chernozhukov et al. (2016) and by using the data set in Chernozhukov and Hansen (2004), we thus construct both our outcome variable and the explanatory variable of interest after partialling out the effects of the other variables including the dummies for age, education, marital status, family size, and homeownership. The sample size is 9915. In the example presented herein, we consider the following nonparametric regression with $s$ thresholds:

[TABLE]

where the threshold variable is income, while $Y_{po}$ and $D_{po}$ are the partialled out total wealth and partialled out 401(k) eligibility, respectively.

We implement the test $F_{n}(s+1|s)$ in Theorem 7 to determine the number of thresholds and then estimate the corresponding threshold values by using the sequential method. The weighting function is constructed as $A(d)=\{d\in[-0.5,0.5]\}$ , and the bandwidth $h=c\cdot\hat{\sigma}\times(9915)^{-1/4.25}$ , where $\hat{\sigma}=0.46$ and $c$ is set to 1. We first conduct a test for the null hypothesis that $s=0$ versus $s=1$ . We find that the value of the test statistic is 50.46, thereby rejecting the null. The first-round estimated threshold value $\hat{\gamma}_{1}=75,000.3$ (92nd percentile). Since there are a small number of observations on the right-hand side interval of this threshold value, we conduct the next test, in the interval $[0,\ 75000.3]$ , for the null hypothesis that $s=1$ versus $s=2$ . The corresponding value of the test statistic is 27.34, which again rejects the null. The second-round estimated threshold value $\hat{\gamma}_{2}=42,600$ (68th percentile). We now conduct the test for the null hypothesis that $s=2$ versus $s=3$ in the joint interval of $[0,\ 42600]$ and $[42600,\ 75000.3]$ . The value of the joint test statistic is 2.62. Thus, we reject the null, and then estimate the threshold value in this joint interval according to Theorem 12. We obtain $\hat{\gamma}_{3}=31,836$ (50th percentile). Since there are insufficient observations in the intervals $[31836,\ 42600]$ and $[42600,\ 75000.3]$ , we only conduct our next test to detect whether an extra threshold exists in the interval $[0,\ 31836]$ . Finally, we conduct the test for $s=3$ versus $s=4$ in the interval of $[0,\ 31836]$ Here, we do not reject the null because the test statistic with the value 0.85 is less than the critical value. We also conduct robustness checks by using different bandwidth values with $c=1.05$ and $c=0.95$ . The corresponding three threshold values found are the same as those found with $c=1$ . In short, our testing procedure allows us to identify four threshold regions and the estimated income threshold values are $\$ 31,836\ (50%) $,$ $42,600\ (68%) $, and$ $75,000.3\ (92%)$. The crucial income threshold values are therefore all above the median income values.

7 Conclusion

In this study, we identify the number of thresholds and estimate the threshold values for a nonparametric regression with multiple thresholds. The significance test of Aït-Sahalia et al. (2001) is modified to detect the existence of an extra threshold (i.e., $s$ versus $s+1$ thresholds). The asymptotic properties of the modified tests are then established. Based on the modified test, a procedure for determining the number of thresholds is suggested. Accordingly, we then carry out the sequential method to estimate the unknown threshold values. We also derive the asymptotic properties of the corresponding threshold value estimator. Our simulation results signify that the proposed estimators perform adequately in finite samples. To illustrate our testing procedure, we present an empirical analysis of the 401(k) plan with income thresholds.

Appendix

Proof of Theorem 1.

The kernel density estimator is defined by

[TABLE]

Suppose the kernel satisfies the conditions in Assumption 2 and is a second-order ( $r=2$ ) kernel function and that Assumptions 1-1 to 1-4 hold. Then, $\hat{f}_{\gamma_{j}}(\mathbf{x})$ has the expectation

[TABLE]

and the variance

[TABLE]

Assuming $M(n)$ satisfies

[TABLE]

we have

[TABLE]

By denoting $M_{1}=\max_{l\in[1,\ldots,M(n)]}Cov[\mathcal{K}_{h}(\mathbf{X}_{1}-\mathbf{x})I_{\gamma_{j}}(Q_{1})),\mathcal{K}_{h}(\mathbf{X}_{1+l}-\mathbf{x})I_{\gamma_{j}}(Q_{1+l}))]$ , we obtain

[TABLE]

Denote $W_{ni}(\mathbf{x})=\mathcal{K}_{h}(\mathbf{X}_{i}-\mathbf{x})I_{\gamma_{j}}(Q_{i}))-\mbox{E}[\mathcal{K}_{h}(\mathbf{X}_{i}-\mathbf{x})I_{\gamma_{j}}(Q_{i}))]$ and for any $\delta>0$ , the upper bound of the covariance terms can be obtained by Lemma A.0 of Fan and Li(1999) as

[TABLE]

where $M_{2}$ is defined as

[TABLE]

Furthermore, given that Assumption 1-1 holds,

[TABLE]

By combining (34), (35) and (36), we have the variance of $\hat{f}_{\gamma_{j}}(\mathbf{x})$ as

[TABLE]

In general, if the $r$ th-order kernel function is considered, (32) becomes

[TABLE]

Given the results in (37) and (38) and that the bandwidth $h$ satisfies Assumption 3-1, the uniform almost sure convergence rate of a kernel density estimator can be obtained; see Lemma 2 and Lemma 8 in Stone (1983). Given the results in (32) and (37), and that Assumptions 1, 2, 3-1, and 5-2 hold, the asymptotic sampling distribution of $\hat{f}_{\gamma_{j}}(\mathbf{x})$ is derived by Masry (1996) and Li and Racine (2007). $\blacksquare$

Proof of Theorem 2.

Given a second-order kernel function as well as equations (32) and (37), we have

[TABLE]

Together with (39), the local constant estimator can be rewritten as

[TABLE]

Under the correct specification of a nonparametric regression with $s$ thresholds, the first term in the previous result is

[TABLE]

From Assumption 1-1, we have

[TABLE]

where $P[I_{\gamma_{j}}(Q_{i})\times I_{\gamma_{l}}(Q_{i})=0]=1,j\neq l$ ; $I_{\gamma_{j}}(Q_{i})\times I_{\gamma_{l}}(Q_{i})=I_{\gamma_{j}}(Q_{i}),j=l$ . Thus, (40) becomes

[TABLE]

Further, the asymptotic variance term is

[TABLE]

with

[TABLE]

Given that Assumption 1-1 holds, and from arguments similar to the proof for Theorem 1, the covariance term $V_{2}=o\left((nh^{p})^{-1}\right)$ . We have

[TABLE]

and the covariance terms are

[TABLE]

In general, when the kernel is an $r$ th kernel function, (40) becomes

[TABLE]

Given that (43), (45), and Assumption 3-1 hold, the result of part a) in Theorem 2 is verified based on Lemmas 2 and 8 of Stone (1983). Moreover, given (41), (43), (44), and that Assumption 3-1 holds, the result of part b) in Theorem 2 holds according to the central limit theorem; see Masry (1996) and Li and Racine (2007). $\blacksquare$

Proof of Theorem 3.

By substituting (43) and (45) into the mean integrated square error, we have the optimal bandwidth defined as

[TABLE]

Taking the first-order derivative of (46) with respect to $h$ ,

[TABLE]

we then have $h_{opt}=O(n^{\frac{-1}{2r+p}})$ . It is clear that the convergence rate of $h_{opt}$ depends on the dimension of $\mathbf{X}$ , $p$ , and the orders of the kernel function, $r$ . It is worth noting that the convergence rate does not depend on the number of thresholds, $s$ . This result suggests that the bandwidth can be selected without considering the number of thresholds. $\blacksquare$

Proof of Theorem 4.

Since

[TABLE]

we have

[TABLE]

Note that

[TABLE]

We need the following lemmas to complete the proof.

Lemma 1.

(Lemma 2 of Aït Sahalia et al. (2001))

Defining

[TABLE]

where

[TABLE]

we have

[TABLE]

Lemma 2.

(Lemma 7 of Aït Sahalia et al. (2001))

[TABLE]

with

[TABLE]

where $\alpha_{\gamma_{j}}(\mathbf{x})=\frac{y-m_{\gamma_{j}}(\mathbf{x})}{f_{\gamma_{j}}(\mathbf{x})}$ , $\alpha_{\gamma_{j-1},\tau_{j}}(\mathbf{x})=\frac{y-m_{\gamma_{j-1},\tau_{j}}(\mathbf{x})}{f_{\gamma_{j-1},\tau_{j}}(\mathbf{x})}$ , and $\alpha_{\gamma_{j}}(\mathbf{x})=\frac{y-m_{\gamma_{j-1},\tau_{j}}(\mathbf{x})}{f_{\gamma_{j-1},\tau_{j}}(\mathbf{x})}$ . $\square$

Lemma 3.

(Hall, 1984).

Let $\{Z_{i};i=1,\ldots,n\}$ be an i.i.d sequence. Suppose that the U-statistic $U_{n}=\sum_{1\leq i<j\leq n}\tilde{P}_{n}(Z_{i},Z_{j})$ with the symmetric variable function $\tilde{P}_{n}$ being centered (i.e., $\mbox{E}[\tilde{P}_{n}(Z_{1},Z_{2})]=0$ ) and degenerate (i.e., $\mbox{E}[\tilde{P}_{n}(Z_{1},Z_{2})|Z_{1}=z_{1}]=0$ almost surely for all $z_{1}$ ). Let

[TABLE]

Then, if

[TABLE]

we have that as $n\rightarrow\infty$

[TABLE]

From Lemma 2, we have

[TABLE]

To prove this, denote $\Psi(t)$ as

[TABLE]

It can then be seen that when

[TABLE]

are specified, we have $\Psi(1)=\Gamma(\hat{f}_{\gamma_{j}},\hat{f}_{\gamma_{j-1},\tau_{j}},\hat{f}_{\tau_{j},\gamma_{j}},F)$ and $\Psi(0)=\Gamma(f_{\gamma_{j}},f_{\gamma_{j-1},\tau_{j}},f_{\tau_{j},\gamma_{j}},F)$ . Then, by the Taylor expansion,

[TABLE]

and thus it is equivalent to have

[TABLE]

where $t^{*}\in(0,1)$ . Denote

[TABLE]

so that $\Psi(t)$ can be written as

[TABLE]

where

[TABLE]

in which the first derivative of $\psi(t)$ is

[TABLE]

the second derivative of $\psi(t)$ is

[TABLE]

and the third derivative of $\psi(t)$ is

[TABLE]

It is clear that $\psi(0)=0$ under the null hypothesis. Therefore, under the null $\Gamma(f_{\gamma_{j}},f_{\gamma_{j-1},\tau_{j}},f_{\tau_{j},\gamma_{j}},F)$ , we have

[TABLE]

Given Lemma 1, $\Psi^{(3)}(t^{*})$ satisfies

[TABLE]

Given that Assumption 3-2 holds,

[TABLE]

For the term $\frac{\partial\psi(t)}{\partial t}|_{t=0}$ in $I_{n}$ , it is clear that

[TABLE]

in which

[TABLE]

Since $\int\frac{y-m_{\gamma_{j}}(\mathbf{x})}{f_{\gamma_{j}}(\mathbf{x})}f_{\gamma_{j}}(y,\mathbf{x})dy=0$ ,

[TABLE]

Similarly,

[TABLE]

and

[TABLE]

Therefore, we obtain

[TABLE]

Specifically,

[TABLE]

To simplify the expression, we denote

[TABLE]

and also denote its de-mean as

[TABLE]

Finally, the term $I_{n}$ can be expressed as

[TABLE]

In the above equation, the term $I_{n1}(\tau_{j})$ is asymptotically normal, $I_{n2}(\tau_{j})$ is the asymptotic bias, and $I_{n3}(\tau_{j})$ and $I_{n4}(\tau_{j})$ are asymptotically negligible.

In addition,

[TABLE]

and uniformly in $\mathbf{x}$ in $S$ from Assumption 3-2,

[TABLE]

Denote

[TABLE]

$I_{n2}(\tau_{j})$ can then be simplified to

[TABLE]

We then have

[TABLE]

and

[TABLE]

From Chebyshev’s inequality it follows that

[TABLE]

Let $Z_{i}=(Y_{i},\mathbf{X}_{i},Q_{i})$ and denote

[TABLE]

which verifies the centering and degeneracy conditions by construction. In addition, since

[TABLE]

we have

[TABLE]

which is the necessary condition for having Lemma 3 applicable.

As for $\sigma^{2}_{n}(\tau_{j})$ , we have

[TABLE]

Thus, the following are obtained:

[TABLE]

Hence,

[TABLE]

where

[TABLE]

According to Lemma 3 (Hall, 1984), we have

[TABLE]

From (48), we obtain

[TABLE]

and then

[TABLE]

from Chebyshev’s inequality

[TABLE]

and from (47), we have the following result for $I_{n4}(\tau_{j})$ :

[TABLE]

Note that this proof is established under $\{(Y_{i},\mathbf{X}_{i},Q_{i}),i=1,\ldots,n\}$ are i.i.d. For mixing data with the $\beta$ -coefficient as in Assumption 1-1, Aït-Sahalia et al. (2001), Fan and Li (1999), and Dette and Spreckelsen (2004) point out that this result also holds. $\blacksquare$

Proof of Theorem 5.

To begin with, we write out the term $\varphi(\tau_{j,l},\tau_{j,k})$ as follows:

[TABLE]

Let

[TABLE]

By definition,

[TABLE]

where

[TABLE]

and

[TABLE]

where

[TABLE]

and

[TABLE]

Denote

[TABLE]

Therefore, the variance-covariance is

[TABLE]

Proof of Theorem 6.

Let

[TABLE]

with

[TABLE]

It is clear that $\int\int|\delta_{\tau_{j}}(\mathbf{x},q)|d\mathbf{x}dq\neq 0$ when the alternative hypothesis is true.

As in the proof of Theorem 4, we know that

[TABLE]

where

[TABLE]

It is clear that $\psi(t)=0$ under the null and $\psi(t)\neq 0$ under the alternative. Then, under the alternative,

[TABLE]

and

[TABLE]

Given the following results in the proof of Theorem 4,

[TABLE]

we have

[TABLE]

Therefore, under the alternative,

[TABLE]

When the alternative converges to the null at speed $n^{-1/2}h^{-p/4}$ , we get $nh^{p/2}\int\int\psi(t)\frac{\partial\psi(t)}{\partial t}a(x)dF(x,q)=[O((nh^{2/p+2r})^{1/2})+O_{p}((nh)^{-1/2})]=o_{p}(1)$ . Similarly, we have $nh^{p/2}\int\int\psi(t)\frac{\partial^{2}\psi(t)}{\partial t^{2}}a(x)dF(x,q)=o_{p}(1)$ . Hence, from Proposition 2 of Aït Sahalia et al. (2001), we have proved Theorem 6. $\blacksquare$

Proof of Theorem 7.

Observe that the indicator functions defined on distinct intervals are mutually exclusive. Therefore the asymptotic covariance between the statistics $\delta(\tau_{j,l_{1}})$ and $\delta(\tau_{k,l_{2}})$ ( $l\neq k$ ) is zero. In what follows, we verify this fact. Let $\tau_{j,l_{1}}$ and $\tau_{k,l_{2}}$ , respectively be the $l_{1}$ and $l_{2}$ splitting points in the intervals of $[\gamma_{j-1},\gamma_{j})$ and $[\gamma_{k-1},\gamma_{k})$ ; also let $l\neq k$ . Following the proof of Theorem 4, we have

[TABLE]

where

[TABLE]

As those defined in Theorem 4,

[TABLE]

Following the proof of Theorem 5, we denote

[TABLE]

and obtain

[TABLE]

The equation above signifies that the indicators $a(i)$ , $a(j)$ , $c(i)$ , and $c(j)$ are mutually exclusive; $b(i)$ , $b(j)$ , $d(i)$ , and $d(j)$ are also mutually exclusive. Hence, $Cov(\delta(\tau_{j,l_{1}}),\delta(\tau_{k,l_{2}}))$ is of $o_{p}(1)$ . Further, $\delta(\tau_{j,l_{1}})$ and $\delta(\tau_{k,l_{2}})$ are asymptotically normally distributed, and they thus can be seen as asymptotically independent. Accordingly, with the same assumptions imposed in Theorem 5, Theorem 7 holds. $\blacksquare$ .

Proof of Theorem 8.

With $s$ pseudo threshold values, $[\tau_{1},\ldots,\tau_{s}]$ in $[\gamma_{0},\gamma_{s+1}]$ , the conditional mean estimator is constructed as

[TABLE]

with

[TABLE]

and $\tau_{0}=\gamma_{0}$ , $\tau_{s+1}=\gamma_{s+1}$ .

To proceed, we need the following lemmas.

Lemma 4.

For any $[\tau_{1},\ldots,\tau_{s}]$ , we have

[TABLE]

and

[TABLE]

Proof: Since $\hat{m}_{\tau_{j}}(\mathbf{x})$ is a local constant estimator, its almost sure convergence rate is $O_{p}(h^{r}+(\ln(n))^{1/2}/(nh^{p})^{1/2})$ from the result of part a) in Theorem 2. From the definition of $m_{\tau_{j}}(\mathbf{x})$ ,

[TABLE]

Lemma 5.

Under the condition that $\mathbf{X}$ and $Q$ are exogenous, we have

[TABLE]

Proof: The second moment of $g(\mathbf{X}_{i},Q_{i})e_{i}$ exists, that is

[TABLE]

Since $E[g(\mathbf{X}_{i},Q_{i})e_{i}]=0$ for $\mathbf{X}$ and $Q$ being exogenous, from the law of large number, we have

[TABLE]

Let $d_{s,j}(\mathbf{X}_{i},Q_{i})=\hat{m}_{\tau_{j}}(\mathbf{X}_{i})I_{\tau_{j}}(Q_{i})-m_{\gamma_{j}}(\mathbf{X}_{i})I_{\gamma_{j}}(Q_{i})$ and $G_{\mathbf{X},Q}(\tau_{1},\ldots,\tau_{s})=E\left[\sum^{s+1}_{j=1}d_{s,j}(\mathbf{X}_{i},Q_{i})\right]^{2}$ . The estimated sum of squared residuals at threshold values $[\tau_{1},\ldots,\tau_{s}]$ is

[TABLE]

with

[TABLE]

Moreover,

[TABLE]

It is clear that, from Lemma 4, $G_{\mathbf{X},Q}(\gamma_{1},\ldots,\gamma_{s})$ and $H(\tau_{1},\ldots,\tau_{s})$ have their minimum at $\tau_{j}=\gamma_{j},\forall j\in 1,\ldots,s$ . According to Theorem 2.1 of Newey and McFadden (1994), we then have

[TABLE]

This is the proof of part a) of Theorem 8. $\quad\blacksquare$

For the proof of part b) of Theorem 8, without loss of generality, we provide the proof of $\hat{\gamma}_{2}\to^{p}\gamma_{2}$ in a nonparametric regression with three thresholds. Denote

[TABLE]

where $c_{2,3}(\mathbf{X}_{i})=m_{\gamma_{2}}(\mathbf{X}_{i})-m_{\gamma_{3}}(\mathbf{X}_{i})$ .

The following lemmas are needed for our proof.

Lemma 6.

Set $\bar{v}=\frac{8K}{\eta^{2}d^{2}_{1}(1-1/b)\epsilon}$ and

[TABLE]

There exist the constants $B>0$ , $0<d_{1},d_{2},d_{3}<\infty$ , and $0<c<\infty$ , such that for all $\eta>0$ and $\epsilon>0$ , there exists a $\bar{v}<\infty$ such that for all $n$ ,

[TABLE]

Proof: See Lemma A.7 of Hansen (2000). $\blacksquare$

Lemma 7.

For all $\eta>0$ and $\epsilon>0$ , there exists some $\bar{v}<\infty$ such that for any $B<\infty$ ,

[TABLE]

Proof: See Lemma A.8 of Hansen (2000). $\blacksquare$

Let $E_{n}$ be the intersection sets of $\max(|\hat{\gamma}_{1}-\gamma_{1}|,|\hat{\gamma}_{2}-\gamma_{2}|,|\hat{\gamma}_{3}-\gamma_{3}|)\leq B$ and $\sup|\hat{c}_{2,3}(\mathbf{X}_{i})-c_{2,3}(\mathbf{X}_{i})|\leq\kappa$ . From Lemmas 6 and 7, we have

[TABLE]

Take $\eta$ and $\kappa$ to be sufficiently small such that

[TABLE]

We thus have

[TABLE]

Therefore,

[TABLE]

This result indicates, in the event $E_{n}$ , $SSR(\tau_{1},\tau_{2},\tau_{3})-SSR(\tau_{1},\gamma_{2},\tau_{3})>0$ when $\tau_{2}\in[\gamma_{2}+\bar{v}/n,\gamma_{2}+B]$ and when $\tau_{2}\in[\gamma_{2}-B,\gamma_{2}-\bar{v}/n]$ . However, this contradicts the fact that $SSR(\tau_{1},\tau_{2},\tau_{3})-SSR(\tau_{1},\gamma_{2},\tau_{3})\leq 0$ . Therefore, the foregoing analysis implies $|\hat{\gamma}_{2}-\gamma_{2}|\leq\bar{v}/n$ , and then, $P(E_{n})\geq 1-\epsilon$ for $n\leq\bar{n}$ . This is equivalent to $P(n|\hat{\gamma}_{2}-\gamma_{2}|>\bar{v})\leq\epsilon$ for $n\geq\bar{n}$ . $\blacksquare$

Proof of Theorem 9.

The following lemmas are necessary for proving Theorem 9.

Lemma 8.

Given the existence of the small effect, $\delta_{n,l,k}(\mathbf{X}_{i})=a_{n}c_{l,k}(\mathbf{X}_{i})\to 0$ ,

[TABLE]

where $a_{n}=n^{1-2\alpha}$ . $\square$

Proof: The proof is similar to the one in part b) of Theorem 8. $\blacksquare$

Let us fix some new notations before introducing a new lemma.

[TABLE]

Lemma 9.

Let $G_{n,2,3}(v)=a_{n}\sum^{n}_{i=1}c^{*2}_{2,3}(\mathbf{X}_{i})d_{2,i}(v)$ and $d_{2,i}(v)=I_{\gamma_{2}+v/a_{n},\gamma_{j}}(Q_{i})$ . We then have

[TABLE]

Proof: Since

[TABLE]

from Lemma A.2 of Hansen (2000)

[TABLE]

Therefore, $G_{n,2,3}(v)(v)\to^{p}\mu_{2}|v|$ according to Chebyshev’s inequality. $\blacksquare$

Let $R_{n,2,3}(v)=\frac{\sqrt{a_{n}}}{\sqrt{n}}\sum^{n}_{i=1}c^{*}_{2,3}(\mathbf{X}_{i})e_{i}d_{2,i}(v)$ . We have the following functional central limit theorem:

Lemma 10.

[TABLE]

and $B(v)$ is a standard Brownian motion. $\square$

Proof: The variance of $R_{n,2,3}(v)$ is

[TABLE]

For any $M(n)\to\infty$ satisfying $M(n)/a_{n}\to 0$ , $V_{1n}$ is

[TABLE]

Furthermore, let

[TABLE]

We then have

[TABLE]

From Lemma A.0 of Fan and Li (1999), it can then be seen that

[TABLE]

where

[TABLE]

In addition,

[TABLE]

By combining (53), (54), and (55), we have

[TABLE]

Next, the big block and small block method is used to derive the asymptotic normality of $R_{n,2,3}(v)$ . Let $s_{n}$ and $l_{n}$ satisfy

[TABLE]

where $\alpha$ is the mixing coefficient of $(Y_{i},\mathbf{X}_{i},Q_{i})$ . Denote

[TABLE]

where $k_{n}=[\frac{n}{s_{n}+l_{n}}]$ , $[\cdot]$ is a Gaussian function. Then $R_{n,2,3}(v)$ can be rewritten as

[TABLE]

The necessary conditions for applying a functional central limit theorem in a big and small block method include

[TABLE]

From (56), we have the variance $V(\eta_{j})=s_{n}v\theta$ and then the variance $V(R^{{}^{\prime\prime}}_{n,2,3}(v))=n^{-1}k_{n}s_{n}v\lambda_{2}=\frac{s_{n}}{l_{n}+s_{n}}v\theta=o(1)$ . Similarly, we have $V(R^{{}^{\prime\prime\prime}}_{n,2,3}(v))=o(1)$ . Therefore, it is clear that $(\ref{gammaNor1})$ holds. In addition, as $V(R^{{}^{\prime}}_{n,2,3}(v))=n^{-1}k_{n}l_{n}v\lambda_{2}=\frac{l_{n}}{l_{n}+s_{n}}\lambda_{2}v=v\lambda_{2}$ , it can be seen that (59) also holds.

From Proposition 2.6 of Fan and Yao (2003), we have

[TABLE]

and then $(\ref{gammaNor2})$ also holds. Furthermore, from Lemma 1 of Hansen (2000), and by letting $D_{1}=max_{q\in R}E[m(\mathbf{X}_{i}e_{i}|Q_{i}=q)]$ , we obtain

[TABLE]

and then (60) holds. From Lemma 3 of Hansen (1999), we have

[TABLE]

and then (61) also holds. Finally, combining equations (57) through (61), we have proved $(\lambda_{2})^{-1/2}R_{n,2,3}(v)\to^{d}B(v)$ .

Given Lemma 8, the probability of having $\hat{\gamma}_{2}$ in $(\gamma_{2}-\bar{v}/n,\gamma_{2}+\bar{v}/n)$ is $1-\epsilon$ . Denote $Q_{n}(v)=SSR(\tau_{1},\gamma_{2},\tau_{3})-SSR(\tau_{1},\gamma_{2}+v/a_{n},\tau_{3})$ . We consequently have

[TABLE]

and

[TABLE]

Given Lemmas 9 and 10, we have

[TABLE]

and then from Theorem 2.7 of Kim and Pollard (1990), we obtain Theorem 1 of Hansen (2000),

[TABLE]

Note for Theorem 10.

[TABLE]

where

[TABLE]

with $I_{\gamma,\gamma_{1}}(Q_{i})=1$ for $Q_{i}\in[\gamma,\gamma_{1})$ and 0 otherwise, and

[TABLE]

with $I_{\gamma_{1},\gamma}(Q_{i})=1$ for $Q_{i}\in(\gamma_{1},\gamma]$ and 0 otherwise, $I_{\gamma,\gamma_{2}}(Q_{i})=1$ for $Q_{i}\in[\gamma,\gamma_{2})$ and 0 otherwise, and

[TABLE]

with $I_{\gamma_{2},\gamma}(Q_{i})=1$ for $Q_{i}\in(\gamma_{2},\gamma]$ and 0 otherwise, $I_{\gamma,\gamma_{3}}(Q_{i})=1$ for $Q_{i}\in[\gamma,\gamma_{3})$ and 0 otherwise, and

[TABLE]

with $I_{\gamma_{3},\gamma}(Q_{i})=1$ for $Q_{i}\in(\gamma_{3},\gamma)$ and 0 otherwise. $\square$

Graphic Description of Theorem 10.

Case of $b(1)$$\gamma_{0}$$\gamma_{1}$$\gamma$$\gamma_{2}$$\gamma_{3}$$\gamma_{4}$$\hat{m}_{\gamma}(\mathbf{x})$$\hat{m}_{\gamma}^{*}(\mathbf{x})$$(1)$$(2)$$(3)$$(4)$$\hat{m}_{\gamma}(\mathbf{x})$$(1)$$(2)$$(3)$$(4)$$(5)$$\hat{m}_{\gamma}^{*}(\mathbf{x})$ Case of $b(2)$$\gamma_{0}$$\gamma_{1}$$\gamma$$\gamma_{2}$$\gamma_{3}$$\gamma_{4}$$\hat{m}_{\gamma}(\mathbf{x})$$(1)$$(2)$$(3)$$(4)$$(5)$$\hat{m}_{\gamma}^{*}(\mathbf{x})$ Case of $b(3)$$\gamma_{0}$$\gamma_{1}$$\gamma$$\gamma_{2}$$\gamma_{3}$$\gamma_{4}$$\hat{m}_{\gamma}(\mathbf{x})$$(1)$$(2)$$(3)$$(4)$$(5)$$\hat{m}_{\gamma}^{*}(\mathbf{x})$ Case of $b(4)$$\gamma_{0}$$\gamma_{1}$$\gamma$$\gamma_{2}$$\gamma_{3}$$\gamma_{4}$$\hat{m}_{\gamma}(\mathbf{x})$$(1)$$(2)$$(3)$$(4)$$\hat{m}_{\gamma}^{*}(\mathbf{x})$

Given the three true threshold values $\gamma_{1}$ , $\gamma_{2}$ , and $\gamma_{3}$ , the threshold value $\gamma$ of a mis-specified nonparametric regression with one threshold may be in $[\gamma_{0},\gamma_{1})$ , or in $(\gamma_{1},\gamma_{2})$ , or in $(\gamma_{2},\gamma_{3})$ , or in $(\gamma_{3},\gamma_{4}]$ . For $\gamma\in[\gamma_{0},\gamma_{1})$ , there is no model miss-specified error for $Q_{i}\in[\gamma_{0},\gamma]$ but the miss-specified errors are

$Q_{i}\in[\gamma,\gamma_{1}]$ is $\hat{m}_{\gamma}^{*}(\mathbf{x})-m_{\gamma_{1}}(\mathbf{x})$ , 2. 2.

$Q_{i}\in[\gamma_{1},\gamma_{2})$ is $\hat{m}_{\gamma}^{*}(\mathbf{x})-m_{\gamma_{2}}(\mathbf{x})$ , 3. 3.

$Q_{i}\in[\gamma_{2},\gamma_{3})$ is $\hat{m}_{\gamma}^{*}(\mathbf{x})-m_{\gamma_{3}}(\mathbf{x})$ , and 4. 4.

$Q_{i}\in[\gamma_{3},\gamma_{4}]$ is $\hat{m}_{\gamma}^{*}(\mathbf{x})-m_{\gamma_{4}}(\mathbf{x})$ ,

as shown in the first graph of Case $b(1)$ . For $\gamma\in(\gamma_{1},\gamma_{2})$ , the mis-specified errors are

for $Q_{i}\in[\gamma_{0},\gamma_{1})$ is $\hat{m}_{\gamma}(\mathbf{x})-m_{\gamma_{1}}(\mathbf{x})$ and is denoted as (1), 2. 2.

for $Q_{i}\in[\gamma_{1},\gamma)$ is $\hat{m}_{\gamma}(\mathbf{x})-m_{\gamma_{2}}(\mathbf{x})$ and is denoted as (2), 3. 3.

for $Q_{i}\in[\gamma,\gamma_{2})$ is $\hat{m}_{\gamma}^{*}(\mathbf{x})-m_{\gamma_{2}}(\mathbf{x})$ and is denoted as (3), 4. 4.

for $Q_{i}\in[\gamma_{2},\gamma_{3})$ is $\hat{m}_{\gamma}^{*}(\mathbf{x})-m_{\gamma_{3}}(\mathbf{x})$ and is denoted as (4), 5. 5.

for $Q_{i}\in[\gamma_{3},\gamma_{4}]$ is $\hat{m}_{\gamma}^{*}(\mathbf{x})-m_{\gamma_{4}}(\mathbf{x})$ and is denoted as (5),

as shown in the second graph of Case $b(2)$ . For $\gamma\in(\gamma_{2},\gamma_{3})$ , the mis-specified errors are

for $Q_{i}\in[\gamma_{0},\gamma_{1})$ is $\hat{m}_{\gamma}(\mathbf{x})-m_{\gamma_{1}}(\mathbf{x})$ and is denoted as (1), 2. 2.

for $Q_{i}\in[\gamma_{1},\gamma_{2})$ is $\hat{m}_{\gamma}(\mathbf{x})-m_{\gamma_{2}}(\mathbf{x})$ and is denoted as (2), 3. 3.

for $Q_{i}\in[\gamma_{2},\gamma)$ is $\hat{m}_{\gamma}(\mathbf{x})-m_{\gamma_{3}}(\mathbf{x})$ and is denoted as (3), 4. 4.

for $Q_{i}\in[\gamma,\gamma_{3})$ is $\hat{m}_{\gamma}^{*}(\mathbf{x})-m_{\gamma_{3}}(\mathbf{x})$ and is denoted as (4), 5. 5.

for $Q_{i}\in[\gamma_{3},\gamma_{4}]$ is $\hat{m}_{\gamma}^{*}(\mathbf{x})-m_{\gamma_{4}}(\mathbf{x})$ and is denoted as (5),

as shown in the third graph of Case $b(3)$ . For $\gamma\in(\gamma_{3},\gamma_{4})$ , the mis-specified errors are

for $Q_{i}\in[\gamma_{0},\gamma_{1})$ is $\hat{m}_{\gamma}(\mathbf{x})-m_{\gamma_{1}}(\mathbf{x})$ and is denoted as (1), 2. 2.

for $Q_{i}\in[\gamma_{1},\gamma_{2})$ is $\hat{m}_{\gamma}(\mathbf{x})-m_{\gamma_{2}}(\mathbf{x})$ and is denoted as (2), 3. 3.

for $Q_{i}\in[\gamma_{2},\gamma_{3})$ is $\hat{m}_{\gamma}(\mathbf{x})-m_{\gamma_{2}}(\mathbf{x})$ and is denoted as (3), 4. 4.

for $Q_{i}\in[\gamma_{3},\gamma)$ is $\hat{m}_{\gamma}(\mathbf{x})-m_{\gamma_{3}}(\mathbf{x})$ and is denoted as (4),

as shown in the last graph of Case $b(3)$ . Note that there is no model mis-specified error for $Q_{i}\in[\gamma,\gamma_{4}]$ in this case.

As to the cases of $\gamma=\gamma_{1}$ , $\gamma_{2}$ , or $\gamma_{3}$ , the model mis-specification errors are

[TABLE]

Proof of Theorems 10 and 11.

From Lemma 4,

[TABLE]

where

[TABLE]

and

[TABLE]

Denote $\gamma$ as a pseudo threshold value considered in a mis-specified nonparametric regression with one threshold and assume $\gamma\in[\gamma_{0},\gamma_{1})$ . From (62), we have

[TABLE]

Based on Lemma 5, the limit of the cross products of $e_{i}$ with the other terms in the above equation will be $o_{p}(1)$ . Note that the cross products among these terms converge to zero. Therefore, the limit of (63) is

[TABLE]

The limiting properties of $b_{2}(\gamma)$ , $b_{3}(\gamma)$ , and $b_{3}(\gamma)$ can be derived in the same manner. $\blacksquare$

Proof of Theorem 12.

The slope of $b_{1}(\gamma)$ for $\gamma\in[\gamma_{0},\gamma_{1})$ is

[TABLE]

The slope of $b_{2}(\gamma)$ for $\gamma\in[\gamma_{1},\gamma_{2})$ is

[TABLE]

The slope of $b_{3}(\gamma)$ for $\gamma\in[\gamma_{2},\gamma_{3})$ is

[TABLE]

Finally, the slope of $b_{4}(\gamma)$ for $\gamma\in[\gamma_{3},\gamma_{4})$ is

[TABLE]

From (64), the slope is a strictly decreasing function in $\gamma$ for $\gamma\in[\gamma_{0},\gamma_{1})$ . Thus, $S(\gamma_{1})$ is the smallest value of the model mis-specification error for $\gamma\in[\gamma_{0},\gamma_{1})$ . For $\gamma\in[\gamma_{1},\gamma_{2})$ , we denote

[TABLE]

$\forall\mathbf{x}_{i}\in R^{p}$ . The partial effect of $\gamma$ on $\pi(\mathbf{x}_{i})$ is

[TABLE]

We have $\int\frac{\partial\pi_{2}(\mathbf{x}_{i},\gamma)}{\partial\gamma}f(\mathbf{x}_{i},\gamma)d\mathbf{x}_{i}<0$ . This result indicates that the minimum of $b_{2}(\gamma)$ is either at $\gamma_{1}$ or at $\gamma$ in spite of the initial value of $b_{1}(\gamma)$ being positive or negative. In other words, either $S(\gamma_{1})$ or $S(\gamma_{2})$ must be the minimal value of the model mis-specification error for $\gamma\in[\gamma_{1},\gamma_{2})$ . In the same manner, either $S(\gamma_{2})$ or $S(\gamma_{3})$ must be the minimal value of the model mis-specification error for $\gamma\in[\gamma_{2},\gamma_{3})$ . Finally, from (67), the slope is a strictly increasing function in $\gamma$ for $\gamma\in[\gamma_{3},\gamma_{4})$ . This fact implies that the minimal value of the model mis-specification error takes place at $\gamma_{3}$ , which is equal to $S(\gamma_{3})$ . Therefore, the minimal value among $S(\gamma_{1})$ , $S(\gamma_{2})$ , and $S(\gamma_{3})$ is the global minimum of the model mis-specification error for $\gamma\in[\gamma_{0},\gamma_{4}]$ . This is the proof of part a) in Theorem 12.

Since $\min(S(\gamma_{1}),S(\gamma_{2}),S(\gamma_{3}))=S(\gamma_{1})$ is assumed, $S(\gamma_{1})$ is the global minimum of the model mis-specification error for $\gamma\in[\gamma_{0},\gamma_{4}]$ . Therefore, from Theorem 2.1 of Newey and McFadden (1994), we have

[TABLE]

This completes the proof of parts b) and c) in Theorem 12. $\blacksquare$

References

Aït-Sahalia, Y., Bickel, P.J., Stoker, T.M., 2001. Goodness-of-fit tests for kernel regression with an application to option implied volatilities, Journal of Econometrics 105, 363–412.

Angrist, J.D., Pischke, J., 2009. Mostly Harmless Econometrics, New Jersey: Princeton University Press.

Bai, J., 1997. Estimating multiple breaks one at a time, Econometric Theory 13, 315–352.

Bai, J., Perron, P., 1998. Estimating and testing linear models with multiple structural changes, Econometrica 66, 47–78.

Bai, J., Perron, P., 2003. Computation and analysis of multiple structural change models, Journal of Applied Econometrics 18, 1–22.

Bhattacharya, P.K., Brockwell, P.J., 1976. The minimum of an additive process with applications to signal estimation and storage theory, Z. Wahrschein. Verw. Gebiete 37, 51–75.

Chan, K.S., 1993. Consistency and limiting distribution of the least squares estimator of a threshold autoregressive model, The Annals of Statistics 21, 520–533.

Chen, B., Hong, Y., 2012. Testing for smooth structural changes in time series models via nonparametric regression, Econometrica 80, 1157–1183.

Chen, B., Hong, Y., 2013. Nonparametric testing for smooth structural change in panel data models, Working Paper, Department of Economics, University of Rochester.

Chen, J.-E., 2008. Estimating and testing quantile regression with structural changes, Working Paper, Department of Economics, NYU.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., 2016. Double machine learning for treatment and causal parameters, Cemmap Working Paper CWP49/16.

Chernozhukov, V., Hansen, C., 2004. The impact of 401(k) participation on the wealth distribution: An instrumental quantile regression analysis, Review of Economics and Statistics 86, 735–751.

Chernozhukov, V., Hansen, C., 2013. High-Dimensional Methods: Examples for Inference on Structural Effects, NBER Summer Institute.

Dette, H., Spreckelsen, I., 2004. Some comments on specification tests in nonparametric absolutely regular processes, Journal of Time Series Analysis 25, 159–172.

Fan, Y., Li, Q., 1999. Central limit theorem for degenerate U-statistics of absolutely regular processes with applications to model specification testing, Journal of Nonparametric Statistics 10, 245–271.

Fan, J., Yao, Q., 2003. Nonlinear Time Series: Nonparametric and Parametric Methods, New York: Springer-Verlag.

Hall, P., 1984. Central limit theorem for integrated squared error of multivariate nonparametric density estimators, Journal of Multivariate Analysis 14, 1–16.

Hansen, B.E., 1999. Threshold effects in non-dynamic panels:Estimation, testing, and inference, Journal of Econometrics 93, 345–368.

Hansen, B.E., 2000. Sample splitting and threshold estimation, Econometrica 68, 575–603.

Henderson, D.J., Parmeter, C.F., Su, L., 2014. Nonparametric threshold regression: Estimation and inference, Working Paper, Department of Economics, University of Miami.

Li, Q., Racine, J.S., 2007. Nonparametric Econometrics: Theory and Practice, Princeton, NJ: Princeton University Press.

Masry, E., 1996. Multivariate regression estimation local polynomial fitting for time series, Stochastic Processes and their Applications 65, 81–101.

Masry, E., Fan, J., 1997. Local polynomial estimation of regression functions for mixing processes, Scandinavian Journal of Statistics 24, 165–179.

Newey, W. K., McFadden, D.L., 1994. Large sample estimation and hypothesis testing, Handbook of Econometrics: Vol. IV, ed. by R. F. Engle and D. L. McFadden, New York: Elsevier, 2113 – 2245.

Oka, T., Qu, Z., 2011. Estimating structural changes in regression quantiles, Journal of Econometrics 162, 248–267.

Poterba, J.M., Venti, S.F., Wise, D.A., 1994a. 401(k) plans and tax-deferred savings, Studies in the Economics of Aging, Chicago: University of Chicago Press, 105–142.

Poterba, J.M., Venti, S.F., Wise, D.A., 1994b. Do 401(k) contributions crowd out other personal saving?, Journal of Public Economics 58, 1–32.

Qu, Z., 2008. Testing for structural change in regression quantiles, Journal of Econometrics 146, 170–184.

Qu, Z., Perron P., 2007. Estimating and testing structural changes in multivariate regressions, Econometrica 75, 459–502.

Stone, C.J., 1983. Optimal uniform rate of convergence for nonparametric estimators of a density function or its derivatives, Recent Advances in Statistics, 393–406. Academic Press, New York.

Su, L., Xiao, Z., 2008. Testing structural change in time-series nonparametric regression models, Statistics and Its Interface 1, 347–366.

Yu, P., Philips, P.C.B., 2015. Threshold regression with endogeneity, Cowles Foundation Discussion Paper no. 1966.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

1 Introduction

2 Model, Assumptions, and Asymptotics

2.1 Assumptions

2.2 Asymptotic Properties of the Estimators under Known Thresholds

Theorem 1**.**

Theorem 2**.**

2.3 Optimal Bandwidth Selector

Theorem 3**.**

3 Determining the Number of Thresholds

3.1 Testing for the Existence of an Extra Threshold

Theorem 4**.**

3.2 Testing for an Extra Unknown Threshold

Theorem 5**.**

3.3 Estimation of the Nuisance Parameters

3.4 Local Alternative Power

Theorem 6**.**

3.5 Identifying the Number of Thresholds

Theorem 7**.**

4 Statistical Properties of the Threshold Estimators

4.1 Added Assumptions

4.2 Asymptotic Properties of the Threshold Value Estimators

Theorem 8**.**

Theorem 9**.**

4.3 Sequential Method

Theorem 10**.**

Theorem 11**.**

Theorem 12**.**

5 Monte Carlo Studies

5.1 Empirical Performance of the Test Statistic

5.2 Finite-sample Performance of the Sequential Method

6 An Empirical Application: the 401(K) Retirement Savings Plan with Income Thresholds

7 Conclusion

Lemma 1**.**

Lemma 2**.**

Lemma 3**.**

Lemma 4**.**

Lemma 5**.**

Lemma 6**.**

Lemma 7**.**

Lemma 8**.**

Lemma 9**.**

Lemma 10**.**

Theorem 1.

Theorem 2.

Theorem 3.

Theorem 4.

Theorem 5.

Theorem 6.

Theorem 7.

Theorem 8.

Theorem 9.

Theorem 10.

Theorem 11.

Theorem 12.

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.