State-domain Change Point Detection for Nonlinear Time Series Regression

Yan Cui; Jun Yang; Zhou Zhou

arXiv:1904.11075·stat.ME·November 22, 2021

State-domain Change Point Detection for Nonlinear Time Series Regression

Yan Cui, Jun Yang, Zhou Zhou

PDF

Open Access

TL;DR

This paper introduces a nonparametric method for detecting and estimating change points in the state domain of nonlinear time series, extending traditional time domain approaches with kernel-based techniques.

Contribution

It proposes a novel density-weighted anti-symmetric kernel method for change point detection in the state domain of nonlinear time series, including estimation of their number and locations.

Findings

01

Theoretical validation of the detection and estimation procedures.

02

Effective change point detection demonstrated on real data.

03

Method outperforms existing approaches in nonlinear settings.

Abstract

Change point detection in time series has attracted substantial interest, but most of the existing results have been focused on detecting change points in the time domain. This paper considers the situation where nonlinear time series have potential change points in the state domain. We apply a density-weighted anti-symmetric kernel function to the state domain and therefore propose a nonparametric procedure to test the existence of change points. When the existence of change points is affirmative, we further introduce an algorithm to estimate the number of change points together with their locations. Theoretical results of the proposed detection and estimation procedures are given and a real dataset is used to illustrate our methods.

Figures10

Click any figure to enlarge with its caption.

Tables5

Table 1. Table 1. Simulated type I error rates for Model A with the first order ADCF of the model.

Model A	$κ_{1}$	0.2	0.4	0.6	0.8
	$ℛ (1)$	0.240	0.321	0.412	0.523
$α = 0.05$	$n = 200$	0.046	0.055	0.064	0.067
	$n = 500$	0.048	0.052	0.060	0.065
	$n = 800$	0.053	0.049	0.050	0.065
$α = 0.1$	$n = 200$	0.103	0.109	0.132	0.131
	$n = 500$	0.096	0.092	0.119	0.138
	$n = 800$	0.099	0.092	0.109	0.126

Table 2. Table 2. Simulated type I error rates for Model B–E with the first and seventh order ADCF of Model E.

	Model	B	C	D	E
$α = 0.05$	$n = 200$	0.040	0.044	0.050	0.060	$ℛ (1)$
	$n = 500$	0.066	0.041	0.054	0.054	0.195
	$n = 800$	0.056	0.051	0.057	0.054
$α = 0.1$	$n = 200$	0.085	0.106	0.124	0.105	$ℛ (7)$
	$n = 500$	0.102	0.092	0.114	0.092	0.258
	$n = 800$	0.093	0.101	0.112	0.095

Table 3. Table 3. Accuracy in estimating the change-point locations and the percentage of correctly estimating the number of change points.

Case 1	$n$	MADE	MSE	Percentage
$a_{1}$ =0	200	0.0451	0.0055	90.51 $%$
	500	0.0195	0.0014	93.77 $%$
	800	0.0134	0.0006	94.51 $%$
Case 2	$n$	MADE	MSE	Percentage
$a_{1} = - 0.3$	200	0.0519	0.0059	$81.82 %$
$a_{2} = 0$	200	0.0757	0.0069	$81.82 %$
$a_{1} = - 0.3$	500	0.0508	0.0043	86.59 $%$
$a_{2} = 0$	500	0.0496	0.0042	86.59 $%$
$a_{1} = - 0.3$	800	0.0386	0.0028	89.80 $%$
$a_{2} = 0$	800	0.0362	0.0024	89.80 $%$

Table 4. Table 4. Simulated rejection rates for testing change point with TAR(1) model.

$κ_{2}$		0.5	0.3	0.1	$- 0.1$	$- 0.3$	$- 0.5$
Para.	$α = 0.05$	0.042	0.175	0.831	0.904	1	1
Para.	$α = 0.1$	0.095	0.282	0.897	0.906	1	1
Nonpara.	$α = 0.05$	0.069	0.256	0.406	0.646	0.792	0.910
Nonpara.	$α = 0.1$	0.131	0.378	0.540	0.761	0.861	0.940

Table 5. Table 5. Estimation accuracy for change-point locations.

	$n$	MADE	MSE
Nonpara.	200	0.1055	0.0143
	500	0.0519	0.0066
	800	0.0367	0.0041
Para.	200	0.0340	0.0027
	500	0.0178	0.0012
	800	0.0098	0.0004

Equations275

X_{i} = μ (X_{i - 1}) + ϵ_{i},

X_{i} = μ (X_{i - 1}) + ϵ_{i},

d X_{t} = μ (X_{t}) d t + d M (t),

d X_{t} = μ (X_{t}) d t + d M (t),

X_{i} = f (i / n) + ε_{i}, i = 1, 2, \dots, n

X_{i} = f (i / n) + ε_{i}, i = 1, 2, \dots, n

μ (x) = j = 0 \sum M μ_{j} (x) \mathbbm 1 (a_{j} \leq x < a_{j + 1}),

μ (x) = j = 0 \sum M μ_{j} (x) \mathbbm 1 (a_{j} \leq x < a_{j + 1}),

ϵ_{i} = G^{*} (ξ_{i}),

ϵ_{i} = G^{*} (ξ_{i}),

X_{i} = G (ξ_{i}),

X_{i} = G (ξ_{i}),

X_{n}^{'} = G (ξ_{n}^{'}), ξ_{n}^{'} := (ξ_{- 1}, η_{0}^{'}, η_{1}, \dots, η_{n}),

X_{n}^{'} = G (ξ_{n}^{'}), ξ_{n}^{'} := (ξ_{- 1}, η_{0}^{'}, η_{1}, \dots, η_{n}),

θ_{n, p} = ∥ X_{n} - X_{n}^{'} ∥_{p} .

θ_{n, p} = ∥ X_{n} - X_{n}^{'} ∥_{p} .

θ_{n, p}^{*} = ∥ ϵ_{n} - ϵ_{n}^{'} ∥_{p},

θ_{n, p}^{*} = ∥ ϵ_{n} - ϵ_{n}^{'} ∥_{p},

\tilde{K}_{n} (X, x, b) := \frac{w _{n}^{*} ( x , b ) K ( \frac{X - x}{b} ) - w _{n} ( x , b ) K ^{*} ( \frac{X - x}{b} )}{w _{n} ( x , b ) w _{n}^{*} ( x , b )},

\tilde{K}_{n} (X, x, b) := \frac{w _{n}^{*} ( x , b ) K ( \frac{X - x}{b} ) - w _{n} ( x , b ) K ^{*} ( \frac{X - x}{b} )}{w _{n} ( x , b ) w _{n}^{*} ( x , b )},

w_{n} (x, b) := \frac{1}{nb} i = 1 \sum n K (\frac{X _{i} - x}{b}), w_{n}^{*} (x, b) := \frac{1}{nb} i = 1 \sum n K^{*} (\frac{X _{i} - x}{b}),

w_{n} (x, b) := \frac{1}{nb} i = 1 \sum n K (\frac{X _{i} - x}{b}), w_{n}^{*} (x, b) := \frac{1}{nb} i = 1 \sum n K^{*} (\frac{X _{i} - x}{b}),

x sup infsup [∣ f_{X_{n} ∣ ξ_{n - 1}} (x) ∣ + ∣ f_{X_{n} ∣ ξ_{n - 1}}^{'} (x) ∣ + ∣ f_{X_{n} ∣ ξ_{n - 1}}^{''} (x) ∣] \leq B, a.s .

x sup infsup [∣ f_{X_{n} ∣ ξ_{n - 1}} (x) ∣ + ∣ f_{X_{n} ∣ ξ_{n - 1}}^{'} (x) ∣ + ∣ f_{X_{n} ∣ ξ_{n - 1}}^{''} (x) ∣] \leq B, a.s .

t_{n} (x) := \frac{f ( x )}{σ ( x )} \frac{1}{nb} k = 1 \sum n \tilde{K}_{n} (X_{k - 1}, x, b) X_{k},

t_{n} (x) := \frac{f ( x )}{σ ( x )} \frac{1}{nb} k = 1 \sum n \tilde{K}_{n} (X_{k - 1}, x, b) X_{k},

f_{n} (x) = \frac{1}{nh} k = 2 \sum n W (\frac{X _{k - 1} - x}{h}),

f_{n} (x) = \frac{1}{nh} k = 2 \sum n W (\frac{X _{k - 1} - x}{h}),

μ_{n} (x) = \frac{1}{nh f _{n} ( x )} k = 2 \sum n W (\frac{X _{k - 1} - x}{h}) X_{k}

μ_{n} (x) = \frac{1}{nh f _{n} ( x )} k = 2 \sum n W (\frac{X _{k - 1} - x}{h}) X_{k}

σ_{n}^{2} (x) = \frac{1}{nh f _{n} ( x )} k = 2 \sum n W (\frac{X _{k - 1} - x}{h}) \overset{e}{^}_{k}^{2} .

σ_{n}^{2} (x) = \frac{1}{nh f _{n} ( x )} k = 2 \sum n W (\frac{X _{k - 1} - x}{h}) \overset{e}{^}_{k}^{2} .

E f_{n} (x) - f (x) = f^{''} (x) h^{2} ψ_{W} + o (h^{2}),

E f_{n} (x) - f (x) = f^{''} (x) h^{2} ψ_{W} + o (h^{2}),

x sup infsup ∣ f_{n} (x) - f (x) ∣ = O_{P} (\frac{( lo g n ) ^{3}}{nh} + h^{2} lo g n) .

x sup infsup ∣ f_{n} (x) - f (x) ∣ = O_{P} (\frac{( lo g n ) ^{3}}{nh} + h^{2} lo g n) .

x sup infsup σ_{n}^{2} (x) - σ^{2} (x) = O_{P} (\frac{( lo g n ) ^{3}}{nh} + h^{2} lo g n) .

x sup infsup σ_{n}^{2} (x) - σ^{2} (x) = O_{P} (\frac{( lo g n ) ^{3}}{nh} + h^{2} lo g n) .

0 < δ_{1} < 1/3, 0 < δ_{2} \leq 1/4, n b^{9} lo g n = o (1),

0 < δ_{1} < 1/3, 0 < δ_{2} \leq 1/4, n b^{9} lo g n = o (1),

P (\frac{nb}{2 λ _{K}} x \in T \cap (T_{a}^{b})^{c} sup infsup ∣ t_{n} (x) ∣ - d_{n} \leq \frac{z}{( 2 lo g b ˉ ^{- 1} ) ^{\frac{1}{2}}}) \to e^{- 2 e^{- z}},

P (\frac{nb}{2 λ _{K}} x \in T \cap (T_{a}^{b})^{c} sup infsup ∣ t_{n} (x) ∣ - d_{n} \leq \frac{z}{( 2 lo g b ˉ ^{- 1} ) ^{\frac{1}{2}}}) \to e^{- 2 e^{- z}},

d_{n} := (2 lo g \overset{ˉ}{b}^{- 1})^{\frac{1}{2}} + \frac{1}{( 2 lo g b ˉ ^{- 1} ) ^{\frac{1}{2}}} lo g \frac{K _{2}}{2 π}

d_{n} := (2 lo g \overset{ˉ}{b}^{- 1})^{\frac{1}{2}} + \frac{1}{( 2 lo g b ˉ ^{- 1} ) ^{\frac{1}{2}}} lo g \frac{K _{2}}{2 π}

P (\frac{nb}{2 λ _{K}} x \in T \cap (T_{a}^{b})^{c} sup infsup ∣ t_{n}^{*} (x) ∣ - d_{n} \leq \frac{z}{( 2 lo g b ˉ ^{- 1} ) ^{\frac{1}{2}}}) \to e^{- 2 e^{- z}} .

P (\frac{nb}{2 λ _{K}} x \in T \cap (T_{a}^{b})^{c} sup infsup ∣ t_{n}^{*} (x) ∣ - d_{n} \leq \frac{z}{( 2 lo g b ˉ ^{- 1} ) ^{\frac{1}{2}}}) \to e^{- 2 e^{- z}} .

P (\frac{nb}{2 λ _{K}} x \in T sup infsup ∣ t_{n}^{*} (x) ∣ - d_{n} \leq \frac{z}{( 2 lo g b ˉ ^{- 1} ) ^{\frac{1}{2}}}) \to e^{- 2 e^{- z}} .

P (\frac{nb}{2 λ _{K}} x \in T sup infsup ∣ t_{n}^{*} (x) ∣ - d_{n} \leq \frac{z}{( 2 lo g b ˉ ^{- 1} ) ^{\frac{1}{2}}}) \to e^{- 2 e^{- z}} .

P (x \in T sup infsup ∣ t_{n} (x) ∣ \geq \frac{2 λ _{K}}{nb} [d_{n} - \frac{lo g { lo g ( 1 - α ) ^{- 1/2} }}{( 2 lo g b ˉ ^{- 1} ) ^{1/2}}]) \to 1.

P (x \in T sup infsup ∣ t_{n} (x) ∣ \geq \frac{2 λ _{K}}{nb} [d_{n} - \frac{lo g { lo g ( 1 - α ) ^{- 1/2} }}{( 2 lo g b ˉ ^{- 1} ) ^{1/2}}]) \to 1.

P ({\hat{M} = M} \cap {1 \leq i \leq M max infsup ∣ \overset{a}{^}_{i} - a_{i} ∣ < c_{n}}) \to 1 - α,

P ({\hat{M} = M} \cap {1 \leq i \leq M max infsup ∣ \overset{a}{^}_{i} - a_{i} ∣ < c_{n}}) \to 1 - α,

Π_{n}^{*} = x \in T sup infsup \frac{g ( x )}{nb} k = 1 \sum n \tilde{K}_{n} (U_{k - 1}, x, b) U_{k},

Π_{n}^{*} = x \in T sup infsup \frac{g ( x )}{nb} k = 1 \sum n \tilde{K}_{n} (U_{k - 1}, x, b) U_{k},

0 < δ_{1} < 1/3, 0 < δ_{2} \leq 1/4, n b^{9} lo g n = o (1) .

0 < δ_{1} < 1/3, 0 < δ_{2} \leq 1/4, n b^{9} lo g n = o (1) .

P (\frac{nb}{2 λ _{K}} Π_{n}^{*} - d_{n} \leq \frac{z}{( 2 lo g b ˉ ^{- 1} ) ^{\frac{1}{2}}}) \to e^{- 2 e^{- z}}, as n \to \infty .

P (\frac{nb}{2 λ _{K}} Π_{n}^{*} - d_{n} \leq \frac{z}{( 2 lo g b ˉ ^{- 1} ) ^{\frac{1}{2}}}) \to e^{- 2 e^{- z}}, as n \to \infty .

CV (b)

CV (b)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsControl Systems and Identification · Advanced Control Systems Optimization

Full text

State-domain Change Point Detection for Nonlinear Time Series Regression

Yan Cui1,2

,

Jun Yang3

and

Zhou Zhou1

1 Department of Statistical Sciences, University of Toronto, Canada

2 Institute for Advanced Study in Mathematics, Harbin Institute of Technology, China

3 Department of Statistics, University of Oxford, United Kingdom

{cui,zhou}@utstat.toronto.edu

[email protected]

Abstract.

Change point detection in time series has attracted substantial interest, but most of the existing results have been focused on detecting change points in the time domain. This paper considers the situation where nonlinear time series have potential change points in the state domain. We apply a density-weighted anti-symmetric kernel function to the state domain and therefore propose a nonparametric procedure to test the existence of change points. When the existence of change points is affirmative, we further introduce an algorithm to estimate the number of change points together with their locations. Theoretical results of the proposed detection and estimation procedures are given and a real dataset is used to illustrate our methods.

Key words: Change-point detection; Nonlinear time series; Nonparametric hypothesis test; State domain.

1. Introduction

Consider the following state-domain nonlinear auto-regression

[TABLE]

where $\mu(\cdot)$ is an unknown regression function, $\{\epsilon_{i}\}$ is a martingale difference sequence such that $\mathbb{E}[\epsilon_{i}\mid(\epsilon_{i-1},\epsilon_{i-2},\cdots)]=0$ almost surely. Special cases of Eq. 1 include threshold AR models (Tong, 1990), exponential AR models (Haggan & Ozaki, 1981) and ARCH models (Engle, 1982), among others. Furthermore, Eq. 1 can be viewed as a discretized version of the diffusion model

[TABLE]

where $\mu(\cdot)$ is the instantaneous return or drift function, and $\{\mathbb{M}(t)\}$ is a continuous-time martingale. In the literature, the special case of Model \reftagform@2 with $\mathrm{d}\mathbb{M}(t)=\sigma(X_{t})\mathrm{d}\mathbb{B}(t)$ has been widely discussed to understand and model nonlinear temporal systems in economics and finance, where $\mathbb{B}(t)$ denotes the standard Brownian motion and $\sigma^{2}(\cdot)$ is understood as the volatility function. Among others, Stanton (1997), Chapman & Pearson (2000) and Fan & Zhang (2003) considered the nonparametric estimation of $\mu(\cdot)$ and $\sigma^{2}(\cdot)$ . Zhao (2011) addressed the model validation problem for Eq. 2. In particular, Eq. 2 can be used to model the temporal dynamics of financial data with $\{X_{t}\}$ being interest rates, exchange rates, stock prices or other economic quantities. Among others, Zhao & Wu (2006) considered kernel quantile estimates of Eq. 2 for the exchange rates between Pound and USD. Liu & Wu (2010) constructed simultaneous confidence bands for $\mu(\cdot)$ and $\sigma(\cdot)$ with the U.S. Treasury yield curve rates data. See also the latter papers for further references. Observe that we allow the error process to be general martingale differences in \reftagform@1 which significantly expands the applicability of our theory and methodology in economic applications. As pointed out by one referee, conditional moment restrictions in dynamic economic models routinely arise from Euler/Bellman equations in dynamic programming, which are martingale differences. Furthermore, asset returns, due to the no-arbitrage theory, are (semi)martingales. Hence, their (demeaned) returns are martingale differences.

Throughout this article, following Chapter 6.3 of Fan & Yao (2003), we shall call \reftagform@1 a state-domain nonlinear regression model. The term “state domain” originated from the celebrated state-space models (e.g. Kalman (1960) and Shumway & Stoffer (2000, Chapter 6)) where the dynamics of a sequence of state variables ( $\{X_{i}\}$ in Eq. 1) are driven by a group of control variables ( $\epsilon_{i}$ in Eq. 1) through the nonlinear state equation \reftagform@1. Therefore in this article the term “state domain” refers to the Euclidean space in which the variables on the axes are the state variables. Observe that the state-domain nonlinear regression \reftagform@1 aims to characterize the relationship between $X_{i}$ and past values (states) of the time series through a discretized stochastic differential equation. On the contrary, time-domain nonlinear regression (see e.g. Fan & Yao (2003), Chapter 6.2)

[TABLE]

with $\mathbb{E}[\varepsilon_{i}]=0$ describes the relationship between $X_{i}$ and time.

To date, most investigations on the nonparametric inference procedure of Eq. 1 are based on the assumption that the underlying regression function $\mu(\cdot)$ is continuous, which may cause serious restrictions in many real applications. In fact, in parametric modeling of nonlinear time series, various choices of $\mu(\cdot)$ with possible discontinuities have drawn much attention in the literature. One of the most prominent examples is the threshold model proposed by Tong & Lim (1980), in which regime switches are triggered by an observed variable crossing an unknown threshold. Also, AR model with regime-switch controlled by a Markov chain mechanism was introduced by Tong (1990). In economics, the expanding phase and contracting phase are not always governed by the same dynamics, see Tiao & Tsay (1994); Durlauf & Johnson (1995); McConnell & Perez-Quiros (2000) and other references therein. As a result, the occurrence of abrupt changes in the state-domain regression function $\mu(\cdot)$ is common and detecting as well as estimating them are of vital importance. Motivated by this, in the current paper we focus on the situation where the regression function $\mu(\cdot)$ is piece-wise smooth on an interval of interest $T=[l,u]$ with a finite but unknown number of change points. More precisely, there exist $l=a_{0}<a_{1}<\cdots<a_{M}<a_{M+1}=u$ such that $\mu(\cdot)$ is smooth on each of the intervals $[a_{0},a_{1}),\cdots,[a_{M},a_{M+1}]$ ; that is, on the interval $[l,u]$

[TABLE]

where $M$ is the total number of change points. Throughout this article, we assume $M$ is fixed.

To our knowledge, there exist no results on change point detection of the state-domain regression function $\mu(\cdot)$ in the literature. The purpose of this paper is twofold. First we want to test whether $\mu(x)$ is smooth or discontinuous on the interval $[l,u]$ ; that is to test the null hypothesis $H_{0}:M=0$ of Eq. 4. By sliding a density-weighted anti-symmetric kernel through the state domain, we shall suggest a nonparametric test statistic and non-trivially apply the discretized multivariate Gaussian approximation result of Zaitsev (1987) to establish its asymptotic distribution. Additionally, the Gaussian approximation results also directly suggest a finite sample simulation-based bootstrapping method which improves the accuracy of the test in practical implementations. Second, if $M\geq 1$ , we reject the null hypothesis and subsequently want to locate all the change points. In this case, we propose an estimation procedure and establish the corresponding asymptotic theory on the accuracy of the estimators. Finally, the above theoretical results are of general interest and could be used for a wider class of state-domain change point detection problems.

There is long-standing literature in statistics discussing jump detection of the time-domain nonlinear regression model \reftagform@3 where occasional jumps occur in an otherwise smoothly changing time trend $f(\cdot)$ . It is impossible to show a complete reference here and we only list some representative works. Müller (1992) and Eubank & Speckman (1994) employed a kernel method to estimate jump points in smooth curves. Wang (1995) suggested using wavelets and provided a review of jump-point estimation. Two-step methods were considered by Müller & Song (1997) and Gijbels et al. (1999) to study the asymptotic convergence properties of the jumps. Later, Gijbels et al. (2007) suggested a compromise estimation method which can preserve possible jumps in the curve. Zhang (2016) considered the situation where the trend function allows a growing number of jump points. In econometrics, there is a significant body of literature discussing time-domain jump detection in jump diffusion models; see for instance Bollerslev et al. (2008); Jiang & Oomen (2008); Lee & Mykland (2012) and the references therein. On the other hand, it is well known that state-domain asymptotic theory is very different from that of the time domain (see, for instance Fan & Yao (2003), Chapter 6). In our specific case, uniform asymptotic behavior of our test statistic on $[l,u]$ is arguably more difficult to establish than the corresponding problem in the time domain. In the current paper, we establish that, unlike time-domain change point detection problems of \reftagform@3 where the long-run variances of the process are of crucial importance in the asymptotics, state-domain change point detection theory of \reftagform@1 heavily depends on the conditional variances and densities of the process $\{X_{i}\}$ . We also provide an estimation procedure using simulated critical values to detect and locate the change points. We show that, when the jump sizes have a fixed and positive lower bound, the method will asymptotically detect all the change points with a preassigned probability and an accuracy $c_{n}$ which is much smaller than $1/\sqrt{n}$ , where $n$ is the length of the time series.

The rest of the paper is organized as follows. In Section 2, we introduce the model framework and some basic assumptions. Section 3 contains our main results, including a nonparametric test to determine the existence of change points and a procedure for estimating the number of change points together with their locations. Practical implementation based on a bootstrap procedure and a suitable method for bandwidth selection are discussed in Section 4. Section 5 reports some simulation studies. A real data application of daily COVID-19 infections in Germany is carried out in Section 6. Section 7 contains all the proofs of the theoretical results in Section 3.

2. Model Formulation and Basic Assumptions

Throughout this paper, we use the following notations. A random vector $X\in\mathcal{L}^{p}$ if $\|X\|_{p}:=(\mathbb{E}|X|^{p})^{1/p}<\infty$ . For two random variables $U$ and $V$ , $F_{U\,|\,V}(\cdot)$ denotes the conditional distribution function of $U$ given $V$ and $f_{U\,|\,V}(\cdot)$ denotes the conditional density. Furthermore, for function $g$ with $\mathbb{E}|g(U)|<\infty$ , we let $\mathbb{E}(g(U)\,|\,V):=\int g(x)\mathrm{d}F_{U\,|\,V}(x)$ be the conditional expectation of $g(U)$ given $V$ . Finally, $\mathbbm{1}$ stands for the indicator function.

Assume that the process $\{\epsilon_{i}\}$ is stationary and causal. Following Wu (2005), we assume that $\{\epsilon_{i}\}$ is a Bernoulli shift process such that

[TABLE]

where the function $G^{*}$ is a measurable function such that the process $\{\epsilon_{i}\}$ exists and $\xi_{i}=(\cdots,\eta_{i-1},\eta_{i})$ is a shift process, where $\{\eta_{i}\}$ are independent and identically distributed (i.i.d.) random variables. Furthermore, $\{\epsilon_{i}\}$ is a martingale difference sequence satisfying $\mathbb{E}[\epsilon_{i}\mid(\epsilon_{i-1},\epsilon_{i-2},\cdots)]=0$ almost surely. From Eq. 5, one can interpret the transform $G^{*}$ as the underlying physical mechanism, with $\xi_{i}$ and $G^{*}(\xi_{i})$ being the input and output of the system, respectively.

Similarly, we assume

[TABLE]

where $G$ is a measurable function such that $X_{i}$ exists. To facilitate the main results, we first introduce the time series dependence measures in Wu (2005) associated with $X_{i}$ and $\epsilon_{i}$ . Assume $X\in\mathcal{L}^{p}$ , and let

[TABLE]

where $X_{n}^{\prime}$ is a coupled process of $X_{n}$ with $\eta_{0}$ replaced by an i.i.d. copy $\eta_{0}^{\prime}$ . Then, we define the physical dependence measures of $X_{i}$ as

[TABLE]

Let $\theta_{n,p}=0$ if $n<0$ . Thus for $n\geq 0,~{}\theta_{n,p}$ measures the dependence of the output $G(\xi_{n})$ on the single input $\eta_{0}$ . We refer to Wu (2005) for more details on the physical dependence measures.

Similarly, we define the physical dependence measures for the errors as

[TABLE]

where $\epsilon_{n}^{\prime}=G^{*}(\xi_{n}^{\prime})$ . Let $\theta^{*}_{n,p}=0$ if $n<0$ .

Suppose that $\{X_{i}\}_{i=1}^{n}$ is observed. Recall $H_{0}:M=0$ and we aim to test the null hypothesis that the regression function is smooth. To this end, we introduce a density-weighted anti-symmetric kernel function $\tilde{K}_{n}$ , which is defined by

[TABLE]

where $K(\cdot)$ is a kernel function supported on $S=[0,1]$ with $\int_{S}K(u)\mathrm{d}u=1$ and $K^{*}(u):=K(-u)$ . The data-dependent weights $w_{n}(x,b)$ and $w_{n}^{*}(x,b)$ are defined by

[TABLE]

where $b=b_{n}$ is the bandwidth satisfying $b\to 0$ and $nb\to\infty$ . Note that $w_{n}(x,b)$ and $w_{n}^{*}(x,b)$ are one-sided kernel density estimators. Hence $\tilde{K}_{n}(X,x,b)$ can be approximated by $[K(\frac{X-x}{b})-K^{\ast}(\frac{X-x}{b})]/f(x)$ , where $f(x)$ is the density function of $X_{i}$ . Observing that $K(x)-K^{\ast}(x)$ is an anti-symmetric function, we then call $\tilde{K}_{n}(X,x,b)$ a density-weighted anti-symmetric kernel function. By sliding this kernel function $\tilde{K}_{n}$ through the state domain, we are able to test whether $\mu(x)$ has change points. More specifically, the quantity $\sum_{k=2}^{n}\tilde{K}_{n}(X_{k-1},x,b)X_{k}/{nb}$ is a boundary kernel estimation of $\mu(x^{+})-\mu(x^{-})$ , where $\mu(x^{+})$ and $\mu(x^{-})$ are the right and left limits of $\mu(\cdot)$ at $x$ . Thus, if $x$ is a continuous point of $\mu(\cdot)$ , this quantity will be approximately zero at $x$ . However, if $\mu(\cdot)$ is discontinuous at $x$ , the quantity will be approximately equal to the jump size of $\mu(\cdot)$ at $x$ . To establish our first main result, we need the following regularity conditions:

(a)

There exist $0<\delta_{2}\leq\delta_{1}<1$ such that $n^{-\delta_{1}}=\mathcal{O}(b)$ and $b=\mathcal{O}(n^{-\delta_{2}})$ . 2. (b)

Assume $\mathbb{E}|\epsilon_{i}|^{p}<\infty$ where $p>2/(1-\delta_{1})$ . 3. (c)

For the same $p$ defined in Condition (b), assume that $X_{i}\in\mathcal{L}^{p}$ , $\theta_{n,p}=\mathcal{O}(\rho^{n})$ , and $\theta^{*}_{n,p}=\mathcal{O}(\rho^{n})$ for some $0<\rho<1$ . 4. (d)

The density function $f$ of $X_{i}$ is positive on $[l-\epsilon,~{}u+\epsilon]$ for some $\epsilon>0$ and there exists a constant $B<\infty$ such that

[TABLE] 5. (e)

$K(\cdot)$ is differentiable over $(0,1)$ , the right derivative $K^{\prime}(0+)$ and the left derivative $K^{\prime}(1-)$ exists and $\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{0\leq u\leq 1}|K^{\prime}(u)|<\infty$ . The Lebesgue measure of the set $\{u\in[0,1]:K(u)=0\}$ is zero. Furthermore, $K(0)=K(1)=0,~{}K^{\prime}(0)>0$ and $\int_{0}^{1}uK(u)\mathrm{d}u=0$ .

For the above regularity conditions, Condition (a) specifies the allowable range of the bandwidth. Condition (b) puts a mild moment restriction on $\epsilon_{i}$ . Condition (c) requires that the quantities $\theta_{n,p}$ and $\theta_{n,p}^{*}$ satisfy the geometric moment contraction (GMC) property. The GMC property is preserved in many linear and nonlinear time series models such as the ARMA models and the ARCH and GARCH models; see Shao & Wu (2007) for more discussions. Furthermore, denote $\Theta_{n}:=\sum_{i=0}^{n}\theta_{i,2}$ , which measures the cumulative dependence of $X_{0},...,X_{n}$ on $\eta_{0}$ . Then if Condition (c) holds, it is easy to see that $\Theta_{\infty}<\infty$ which indicates short-range dependence of $\{X_{i}\}$ . With Condition (d), we require that the density and conditional density of $X_{i}$ exist and are bounded. Moreover, $f$ has bounded derivatives up to the second order. Condition (e) puts some restrictions on the smoothness and order of the kernel function $K$ . In particular, $\int_{0}^{1}uK(u)\mathrm{d}u=0$ indicates that $K$ is a second-order kernel which has both positive and negative parts on $[0,1]$ .

3. State-domain Change Point Detection and Estimation

In this section, we propose a test on the existence of change points in $\mu(\cdot)$ and an algorithm to estimate the number and locations of the change points when $\mu(\cdot)$ is discontinuous.

3.1. Test for the existence of change points.

With the foregoing discussion, we introduce a nonparametric statistic based on the density-weighted anti-symmetric kernel to test whether model Eq. 1 has change points in the state domain regression function $\mu(\cdot)$ on $[l,u]$ . By proper scaling, our test statistic is defined as

[TABLE]

where $\sigma^{2}(x)=\mathbb{E}[\epsilon_{i}^{2}|X_{i-1}=x]$ . In practice, since $f(\cdot)$ and $\sigma(\cdot)$ are unknown, we use the kernel density estimator $f_{n}(x)$ and Nadaraya–Watson (NW) estimator $\sigma^{2}_{n}(x)$ to replace $f(x)$ and $\sigma^{2}(x)$ , respectively. The kernel density estimator is given by

[TABLE]

where $W(\cdot)$ is a general kernel function with $W(\cdot)\geq 0$ and $\int W(u)\mathrm{d}u=1,~{}h=h_{n}$ is the bandwidth sequence satisfying $h\to 0$ and $nh\to\infty$ . Let $\hat{e}_{k}^{2}=[X_{k}-\mu_{n}(X_{k-1})]^{2}$ be the square of the estimated residuals, where

[TABLE]

is the NW estimator of $\mu(\cdot)$ , then the NW estimator of $\sigma^{2}(x)$ is given by

[TABLE]

The following remark provides the uniform consistency of the estimated density and conditional variance functions.

*Remark 3.1**.*

Under Condition (a) for both bandwidths $h$ and $b$ with $0<\delta_{1}<1/4$ , Condition (c), Condition (d), and Condition (e), we have

[TABLE]

where $\psi_{W}:=\int u^{2}W(u)\mathrm{d}u/2$ and

[TABLE]

Similarly, for $\sigma^{2}_{n}(x)$ , under the conditions of Theorem 3.2, we also have

[TABLE]

See Section 8.1 for the proof.

Let $f_{\epsilon}(\cdot)$ be the density function of $\epsilon_{i}$ and $\lambda_{K}=\int K^{2}(x)\mathrm{d}x$ . We have the following main result on the asymptotic properties of the proposed test statistic.

Theorem 3.2.

Let $l,u\in\mathbb{R}$ be fixed. Recall the piece-wise formulation of Eq. 4, let $T_{j}^{\epsilon}$ and $T^{\epsilon}$ be the $\epsilon$ -neighborhoods of the intervals $T_{j}=[a_{j},a_{j+1})$ and $T=[l,u]$ , respectively. Let $T_{a}=\{a_{j}\}$ be the collection of the change points, $T_{a}^{\epsilon}$ be the $\epsilon$ -neighborhood of $T_{a}$ . Assume that Condition (a)-Condition (e) hold with $f_{\epsilon}(\cdot),~{}\sigma(\cdot)\in\mathcal{C}^{3}(T^{\epsilon}),~{}\mu_{j}(\cdot)\in\mathcal{C}^{3}(T_{j}^{\epsilon})$ for some $\epsilon>0$ and $b$ satisfies

[TABLE]

then

[TABLE]

where $\bar{b}:=b/(u-l)$ and

[TABLE]

with $K_{2}:=\int_{0}^{1}(K^{\prime}(u))^{2}\mathrm{d}u/\lambda_{K}$ .

Proof.

See Section 7.1. ∎

Theorem 3.2 is a general result which establishes the asymptotic theory of the test statistic. In practical implementation, we will use the density estimates $f_{n}(x)$ and variance estimates $\sigma_{n}(x)$ instead of $f(x)$ and $\sigma(x)$ to calculate $t_{n}(x)$ as discussed before. Therefore, we have the following corollary.

Corollary 3.3.

Denote $t_{n}^{\ast}(x)=\frac{\sqrt{f_{n}(x)}}{\sigma_{n}(x)}\frac{1}{nb}\sum_{k=1}^{n}\tilde{K}_{n}\left(X_{k-1},x,b\right)X_{k}$ . Under the conditions of Theorem 3.2 and further assume the bandwidth $h\leq b$ , then the asymptotic result of Theorem 3.2 holds for $t_{n}^{\ast}(x)$ ; this is

[TABLE]

Note that in Corollary 3.3, we have added the assumption $h\leq b$ with the purpose of ensuring the consistency of $f_{n}(x)$ and $\sigma_{n}(x)$ on $T\cap(T_{a}^{b})^{c}$ . When there is no change point in $\mu(\cdot)$ , we have similar results as shown in the following remark, which suggests that under the null hypothesis, after proper scaling and centering, our test statistic converges to a Gumbel distribution asymptotically.

*Remark 3.4**.*

Assume $H_{0}:M=0$ holds. We further assume that $f(\cdot),~{}\sigma(\cdot)\in\mathcal{C}^{3}(T^{\epsilon})$ and the remaining conditions of Corollary 3.3 hold. Then, $T_{a}=\emptyset$ , $T_{a}^{b}=\emptyset$ , which implies $T\cap(T_{a}^{b})^{c}=T$ . Therefore, the previous theorem reduces to

[TABLE]

Denote the jump-size of $\mu(\cdot)$ at $a_{i}$ as $\Delta_{i}$ ; that is, $\Delta_{i}:=|\mu(a_{i}+)-\mu(a_{i}-)|$ . Next, we consider the alternative hypothesis $H_{a}:M\geq 1$ with $\Delta_{i}\geq\tilde{\Delta}>0$ . When $H_{a}$ holds true, it is easy to see that the proposed test has an asymptotic power $1$ as $n\to\infty$ . In other words, with some preassigned level $\alpha\in(0,1)$ and as $n\to\infty$ , we have

[TABLE]

Once the null hypothesis of no change point is rejected, one would be interested in detecting the number of change points together with their locations, which we discuss in Section 3.2.

3.2. Change-point Estimation

Suppose there exist a fixed number $M$ of change points on $[l,u]$ , which are denoted by $l<a_{1}<\dots<a_{M}<u$ , with the minimum jump size $\operatorname*{\mathrm{min}\vphantom{\mathrm{infsup}}}_{1\leq i\leq M}\Delta_{i}\geq\tilde{\Delta}_{n}>0$ . In this paper, we assume $\tilde{\Delta}_{n}=\mathcal{O}(1)$ which is allowed to decrease with $n$ . The idea for estimating the number and locations of the change points is to search for local maximas of $|t_{n}(x)|$ which exceed the critical value of the test. To be more specific, we propose in the following a procedure for change point estimation.

•

For a fixed level $\alpha$ , perform the bootstrap procedure described in Section 4.1 to determine the critical value, say $C_{n,\alpha}$ .

•

Set $T_{1}:=(l,u)$ .

•

Starting from the interval $T_{1}$ , find the largest $x$ of $|t_{n}(x)|$ that exceeds the critical value and denote its location as $\hat{a}_{(1)}$ , then rule out the interval $[\hat{a}_{(1)}-b,\hat{a}_{(1)}+b]$ from $T_{1}$ to get $T_{2}:=T_{1}\cap[\hat{a}_{(1)}-b,\hat{a}_{(1)}+b]^{c}$ .

•

Repeat the previous step until all significant local maximas are found. In other words, $|t_{n}(x)|$ on the remaining intervals are all below $C_{n,\alpha}$ .

•

Denote the number of detected change points by $\hat{M}$ and re-order the estimated change points as $l<\hat{a}_{1}<\dots<\hat{a}_{\hat{M}}<u$ .

The following theorem provides an asymptotic result for $\hat{M}$ and $\hat{a}_{i}$ .

Theorem 3.5.

Under the conditions of Theorem 3.2, we further assume that $K^{\prime}(\cdot)$ is differentiable over $(0,1)$ with $K^{\prime}(1)=0$ , the right derivative $K^{\prime\prime}(0+)$ and the left derivative $K^{\prime\prime}(1-)$ exist and $\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{0\leq u\leq 1}|K^{\prime\prime}(u)|<\infty$ . The Lebesgue measure of the set $\{u\in[0,1]:K^{\prime}(u)=0\}$ is zero. If $\sqrt{\frac{\log n}{nb}}=o(\tilde{\Delta}_{n})$ then for any given level $\alpha$ , we have

[TABLE]

for any $c_{n}$ such that $1/c_{n}=\mathcal{O}\left(\tilde{\Delta}_{n}\sqrt{\frac{n}{b\log n}}\right)$

Proof.

See Section 7.2. ∎

Theorem 3.5 reveals that for any given small probability $\alpha$ , with asymptotic probability $1-\alpha$ , our proposed procedure will correctly estimate all the change points with an accuracy $c_{n}$ . It is important to mention that when $\tilde{\Delta}_{n}=\tilde{\Delta}>0$ , that is, when the jump sizes have a fixed lower bound, the smallest order for $c_{n}$ is $\sqrt{b\log n/n}$ , which is smaller than $n^{-1/2}$ . It can also be seen as a product of $\sqrt{\log n}$ and the optimal convergence rate $(\sqrt{b/n})$ of time-domain change-point estimators established in Müller (1992). Hence, we conjecture that our rate $c_{n}$ is nearly optimal for state-domain change point detection.

4. Practical Implementation

4.1. The bootstrap procedure

It is well known that the convergence rate of the Gumbel distribution in Theorem 3.2 is slow. As a result, a very large sample size would be needed for the approximation to be reasonably accurate. To overcome this issue, we propose a simulation-based bootstrap procedure to improve the finite-sample performance of the proposed test. The bootstrap procedure is as follows.

•

Generate i.i.d. standard normal random variables $U_{k},~{}k=0,...,n$ .

•

Compute the quantity $\Pi_{n}^{\ast}$ defined in Eq. 26 for many times and calculate its $(1-\alpha)$ th quantile as the critical value of our test.

For the proposed boostrap procedure, we have the following theoretical results which shows that, with proper scaling and centering, $\Pi_{n}^{\ast}$ has the same asymptotic Gumbel distribution.

Proposition 4.1.

Denote $\Pi_{n}=\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{x\in T}|t_{n}^{\ast}(x)|$ and

[TABLE]

where $\{U_{k}\}_{k=0}^{n}$ are i.i.d. standard normal random variables and $g(x)$ is its density. Assume $H_{0}:M=0$ , Condition (a), Condition (e) hold and $b$ satisfies

[TABLE]

Then we have

[TABLE]

Proposition 4.1 shows that $\Pi_{n}^{\ast}$ and $\Pi_{n}$ have the same asymptotic Gumbel distribution with proper scaling and centering under the null hypothesis. Therefore, the $(1-\alpha)$ th quantile of $\Pi_{n}$ can be estimated consistently by calculating the empirical $(1-\alpha)$ th quantile $C_{n,\alpha}$ of $\Pi_{n}^{\ast}$ with a large number of replications by the bootstrap procedure. We reject the null hypothesis at level $\alpha\in(0,1)$ if $\Pi_{n}>C_{n,\alpha}$ . When implementing the procedure described in Section 3.2 for estimating the change points, we also suggest using $C_{n,\alpha}$ to find the detection region. Our numerical experiments suggest that the bootstrap method yields more accurate results than those based on the asymptotic limiting distribution under small or moderate sample sizes.

4.2. Bandwidth selection

The bandwidth used in $f_{n}(x)$ can be chosen based on classic bandwidth selectors for nonparametric kernel density estimation. However, the choice of bandwidth $b$ for test statistic $t_{n}^{\ast}(x)$ and $h$ for the estimated variance $\sigma_{n}^{2}(x)$ can be quite nontrivial and are usually of practical interest. In this paper, we adopt the standard leave-one-out cross-validation criterion for bandwidth selection suggested by Rice & Silverman (1991):

[TABLE]

where $\mu_{n}^{(-k)}(X_{k})$ and $\sigma_{n}^{2(-k)}(X_{k})$ are the kernel estimators of $\mu$ and $\sigma^{2}$ computed with all measurements with the $k$ th subject deleted, respectively. For example, a cross-validation bandwidth $\hat{b}$ can be obtained by minimizing ${\rm CV}(b)$ with respect to $b$ , i.e., $\hat{b}=\mathop{\arg\operatorname*{\mathrm{min}\vphantom{\mathrm{infsup}}}}_{b\in\mathcal{B}}{\rm CV}(b)$ , where $\mathcal{B}$ is the allowable range of $b$ . The bandwidth selection for $h$ is similar.

5. Simulation Study

In this section, we carry out Monte Carlo simulations to examine the finite-sample performances of our proposed test and estimator. Throughout the numerical experiments, the Epanechnikov kernel $W(x)=0.75(1-x^{2})\mathbbm{1}(|x|\leq 1)$ is used for estimating the density and conditional variances. On the other hand, we adopt the higher-order kernel function in the form $K(x)=b[\tilde{W}(x)-a\tilde{W}(\sqrt{a}x)]$ in the expression of $\tilde{K}_{n}$ , where $\tilde{W}(x)$ is the kernel function on [0,1] by shifting and scaling $W(x)$ . From Theorem 3.2, one can see that the power of our test increases as $\lambda_{K}$ decreases. As a result, we aim to maximize the quantity $Q(a,b)=\frac{\int_{0}^{\infty}K(x)\mathrm{d}x}{\sqrt{\int_{0}^{\infty}K^{2}(x)\mathrm{d}x}}$ with the constraints $\int_{0}^{\infty}K(x)\mathrm{d}x=1$ and $\int_{0}^{\infty}xK(x)\mathrm{d}x=0$ to choose $a$ and $b$ . It turns out that $Q(a,b)$ is maximized at $a=0.34$ and $b=\frac{2}{\sqrt{0.34}-0.34}$ . Hence, we will use $K(x)=\frac{2}{\sqrt{0.34}-0.34}[\tilde{W}(x)-0.34\tilde{W}(\sqrt{0.34}x)]$ in our simulations and data analysis.

5.1. Accuracy of bootstrap.

We run Monte Carlo simulations to study the accuracy of the proposed bootstrap procedure for finite samples $n=200,~{}500$ and 800. Here, we aim to test the null hypothesis $H_{0}$ of no change point in the regression function. The number of replications is fixed to be 1000 and the number of bootstrap samples is $B=2000$ at each replication.

To guarantee the stationarity of the process $\{X_{i}\}$ , $|\mu(x)|$ is required to be less than one (Fan & Yao, 2003, Section 2.1). First, we consider Model A listed below to investigate the robustness of our testing procedure with respect to various levels of persistence in the data generating process. Additional four state-domain nonlinear models (Models B - E listed below) where $\mu(\cdot)$ is of various shapes are further investigated for the accuracy of our test. In our simulations the martingale difference process $\epsilon_{i}=\sigma(X_{i-1})\epsilon_{i}^{\ast}$ with $\sigma^{2}(x)=\mathbb{E}(\epsilon_{i}^{2}|X_{i-1}=x)$ and $\epsilon_{i}^{\ast}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathcal{N}(0,1)$ . Note that the error processes $\{\epsilon_{i}\}$ are specified via different conditional variance functions $\sigma^{2}(x)$ in Models A–D. On the other hand, in Model E we set $\epsilon_{i}=0.5\eta_{i}(\eta_{i-7}+1.5)$ where $\eta_{i}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathcal{N}(0,1)$ so that $\{\epsilon_{i}\}$ has a period of 7 which matches the data generating process observed in the empirical data example in Section 6.

•

Model A:

[TABLE]

where $\kappa_{1}=0.2,0.4,0.6,0.8$ represents various levels of temporal dependence in the series.

•

Model B:

[TABLE]

•

Model C:

[TABLE]

•

Model D:

[TABLE]

•

Model E:

[TABLE]

Note that the regression functions $\mu(\cdot)$ in Models A–E are all continuous. At nominal significance levels $\alpha=0.05$ and $0.1$ , the simulated Type I error rates for sample sizes $n=200,500$ and $800$ are reported in Tables 1–2 for Model A and Models B–E, respectively. To measure the strength of the nonlinear temporal dependence, we will employ the auto-distance correlation function (ADCF) investigated in Zhou (2012). In Table 1, we illustrate the first order ADCF (denoted by $\mathcal{R}(1)$ ) for Model A. Meanwhile, for Model E the first order and the seventh order ADCF are listed in Table 2. One can see that the performance of our testing procedure is reasonably accurate for different sample sizes across the models and the accuracy improves as the sample size increases. On the other hand, from Table 1, we find that as the dependence of the process becomes stronger, the type I errors tend to be less accurate, but are still in a reasonable range.

5.2. Power of hypothesis testing

In this subsection, we consider the simulated power of our test under various alternatives. Recall the representation $\epsilon_{i}=\sigma(X_{i-1})\epsilon_{i}^{\ast}$ with $\epsilon_{i}^{\ast}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathcal{N}(0,1)$ . Here, we consider the following two types of alternatives with a change point of size $\delta$ :

•

Model F1:

[TABLE]

•

Model F2:

[TABLE]

We let the jump size $\delta$ range from [math] to $1.6$ for model F1 and from [math] to $1$ for model F2 at location $x=0$ . For each model, we investigate the empirical sensitivity of our testing procedure under nominal levels $0.05$ and $0.1$ with sample size $n=800$ based on $1000$ replications. The simulated power curves for the above models are plotted in Fig. 1 and Fig. 2, respectively. According to the plots, the statistical power of the proposed testing procedure increases reasonably fast as $\delta$ increases. On the other hand, we also observe that our test shows a slower speed of increase at near alternatives when compared with “classic” power curves of parametric tests. We believe that part of the reason is that our nonparametric test aims at detecting alternatives from a large class of discontinuous functions while tests tailored to some parametric models (such as the threshold model) target a specific class of alternative functions. Therefore our test is expected to be less sensitive to small deviations from the null compared to those parametric tests. See also Section 5.4 for a numerical experiment that compares the sensitivity of our testing procedure with that of a parametric test of the threshold model.

5.3. Accuracy in estimating the number of change points and their locations

Utilizing the algorithm listed in Section 3.2, in this subsection we focus on estimating the number of change points and their locations based on $1000$ realizations with sample sizes $n=200,~{}500$ and $800$ . In the simulations, we let the error process $\{\epsilon_{i}^{\ast}\}_{i=1}^{n}$ be i.i.d. standard normal random variables and consider the following two cases:

•

Case 1: A single change point.

[TABLE]

•

Case 2: Two change points.

[TABLE]

The estimates of the locations of change points are compared in terms of their mean absolute deviation errors (MADE) and mean squared errors (MSE). We also report the simulated percentage of correctly estimating the number of change points. The results are listed in Table 3. One can see from Table 3 that the values of MADE and MSE are all quite small, which suggests the estimated locations by our approach are fairly accurate. Furthermore, as the sample size increases, the percentage of correctly estimating the number of change points increases in both cases.

5.4. Comparison to threshold testing and estimation in threshold model

In this subsection, we compare the accuracy and sensitivity of our nonparametric method with existing threshold testing and estimation methods for the classic threshold AR (TAR) model proposed by Tong & Lim (1980) when the TAR model is indeed the underlying data generating mechanism. We consider the following two-regime TAR(1) model

[TABLE]

where $\kappa_{2}=0.5,0.3,0.1,-0.1,-0.3,-0.5$ and the error process $\epsilon_{i}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathcal{N}(0,0.75^{2})$ . First, we are interested in comparing the accuracy and power of our nonparametric test with the parametric $F$ -test of threshold nonlinearity proposed in Tsay (1989). Table 4 shows the testing results for nonlinearity of the model based on both the parametric and nonparametric methods, in which the sample size is $n=800$ and the number of bootstrap samples is $B=2000$ .

We observe that the nonparametric method has slightly higher powers when the scale coefficient $\kappa_{2}$ changes slightly from 0.5. However, as $\kappa_{2}$ becomes $0.1$ or smaller, the parametric method has higher powers than the nonparametric method.

In addition, we compare the accuracy in change point estimation. We study the following TAR(1) model,

[TABLE]

where $\epsilon_{i}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathcal{N}(0,0.75^{2})$ . Note that parametric estimation of the threshold value of the above two-regime TAR(1) process can be done via the R function uTAR in the NTS package (we refer to Liu et al. (2020) for more details). The simulated MADEs and MSEs are listed in Table 5. From Table 5, one can see that both methods provide relatively accurate estimates of the locations of change point (threshold). The parametric method shows more accurate estimation results comparing with those of the nonparametric method. With the above observations, it can be seen that the parametric method is better for testing and detecting change points for the TAR model when the model is well-specified. This result is not surprising since testing sensitivity and estimation accuracy tend to be higher when the model is correctly restricted to a (smaller) parametric class.

6. Illustrative example

In this section, we consider the daily new confirmed cases of Coronavirus disease of 2019 (COVID-19) in Germany. The dataset contains $156$ observations from April 28th to September 30th of 2020 which can be downloaded from https://ourworldindata.org/coronavirus-source-data. From the COVID-19 timeline, Germany registered the first case on January 28th and later suffered an outbreak of this pandemic from mid March to late April. In the data analysis, we select the aforementioned time span between the first and second waves of COVID-19 so that the time series is approximately stationary. Let $X_{i}$ be the logarithm of confirmed cases at day $i=1,...,156$ and $Y_{i}=X_{i+1}-X_{i}$ . The sample path of $\{X_{i}\}$ and the ADCF plot of $\{X_{i}\}$ are shown in LABEL:F3. Both plots in LABEL:F3 suggest that the time series is approximately stationary and has a moderate seasonal dependence with period $S=7$ . The seasonal behaviour probably comes from the reporting lag behind during weekends, which happens in almost every country. We consider the state-domain nonlinear regression model (which is equivalent to Eq. 1):

[TABLE]

where $\{\epsilon_{i}\}$ is a martingale difference sequence. In this application, $\mu(x)$ represents the expected increase or decrease in percentage of COVID-19 cases in day $i$ when $X_{i-1}=x$ .

We apply the proposed method to testing whether $\mu(\cdot)$ contains any change points. We choose $T=[l,u]=[5.7,7.5]$ which includes 82.69 $\%$ of $X_{i}$ so that data are relatively abundant in this region and the test is expected to be accurate. According to the leave-one-out cross-validation criterion, the selected bandwidths $b$ and $h$ are $0.446$ and $0.40$ , respectively. Through the practical implementation in Section 4.1, we calculate the empirical 99 $\%$ quantile of $\Pi_{n}^{\ast}$ with 10000 bootstrap samples, which gives $C_{n,\alpha}=1.596$ . Next, we investigate the behaviour of the test statistics, which is shown in Fig. 4. Our test rejects the null hypothesis of continuity of $\mu(\cdot)$ at $1\%$ level and flags two change points at $\hat{x}_{1}=6.83$ and $\hat{x}_{2}=7.40$ .

Note that $Y_{i}$ can be viewed as the conditional daily growth rate for COVID-19. For comparison, we also use the nonparametric local polynomial method to fit $\mu(x)$ assuming that there is no change point. The corresponding estimated regression function $\mu_{n}(x)$ over $[5.7,7.5]$ is plotted on the left hand side of Fig. 5. On the right hand side of Fig. 5 we plot the fitted drift function $\mu_{n}(x)$ with the knowledge of the change points. The difference between the two plots in Fig. 5 suggest that, with or without the knowledge of change points, our understanding of the relationship between $Y_{i}$ and $X_{i}$ can be quite different. With the knowledge of the change points, we can see that two large jumps exist at $x_{1}=6.83$ and $x_{2}=7.40$ , which shows that the growth rate changes abruptly at these two points.

It is obvious to see from the right plot of Fig. 5 that those two change points divide the state domain into three regimes/phases. Furthermore, the latter plot indicates that the nonlinear dynamics can be approximated by a three-regime threshold model with the data generating mechanism switching at the detected change points. Additionally, according the timeline, we can find out the periods corresponding to each phase. The first phase $x\in[5.7,6.83)$ contains May 3-5, 10-11, May 15-August 5, August 9-10, 16-17, 23-24 and 30-31 where the trajectory depicts a relatively inactive period of the virus transmission and the conditional infection rate $\mu(x)$ decreases from positive to negative as $x$ increases. The second phase $x\in[6.83,7.4)$ includes April 28-30, May 6-9, August 7-9, 11-15, 25-29, September 1-6, 8-9, 13-15 and 20-21 where the conditional infection rate jumps up when $x$ surpasses $6.83$ and then it decreases gradually again. The third phase $x\in[7.4,7.83]$ corresponds to August 20-21, September 16-19 and 22-26 where a sudden large increase in the conditional infection rate can be found at the left boundary and then it decreases sharply, possibly due to strong governmental interventions.

In summary, the analyzed period from April 28th to September 30th of 2020 of German COVID-19 data shows a complicated nonlinear dynamic balance between disease transmission and government intervention. The proposed method of the paper could help understand this complex nonlinear dynamics by determining the boundaries of phases where the state-domain relationship changes abruptly and subsequently segment the time series into multiple regimes.

Acknowledgments. The authors are grateful to the editor, Prof. Serena Ng, and two anonymous referees for their valuable comments and suggestions which significantly improved the quality of the paper. Zhou Zhou’s research has been partially sponsored by NSERC (No.489079). Yan Cui is supported by the China Scholarship Council (No.201806170148).

7. Proofs of main results

7.1. Proof of Theorem 3.2

The outline of the proof is as follows. Firstly, we use the following decomposition of $X_{i}$

[TABLE]

and prove the results involving the first two terms. This is given in Section 7.1.1.

Secondly, we use a technique called $m$ -dependent approximation to approximate the martingale $\{\epsilon_{i}\}$ using $\{\mathbb{E}[\epsilon_{k}\mid\xi_{i,i-m}]-\mathbb{E}[\epsilon_{k}\mid\xi_{i-1,i-m}]\}$ , where $\xi_{k_{1},k_{2}}:=(\eta_{k_{1}},\dots,\eta_{k_{2}})$ , for a properly chosen order $m\rightarrow\infty$ , which simplifies the sum of a sequence of dependent random variables to a corresponding sum of $m$ -dependent random variables. This is done in Section 7.1.2.

Thirdly, we divide the sequence of $n$ ( $m$ -dependent) random variables into alternating big and small blocks, where the length of big blocks has a slightly higher order than that of the small blocks. Furthermore, the length of the small blocks is larger than $m$ . Using this proof technique, we can approximate the sum of $n$ ( $m$ -dependent) random variables using the sum of the subsequence which includes the random variables residing in the big blocks. Since the length of small blocks is larger than $m$ , the $m$ -dependent random variables in different big blocks are independent. This part of the proof is given in Section 7.1.3.

Fourthly, we only need to deal with a sequence of independent sums of random variables within each big block. In order to get prepared for using the multivariate Gaussian approximation result by Zaitsev (1987), we first compute the asymptotic covariance structure of the sequence of independent sums. This is given in Section 7.1.4.

In the final two steps, we first apply the multivariate Gaussian approximation by Zaitsev (1987), which is given in Section 7.1.5 and then prove the convergence to Gumbel distribution, which is given in Section 7.1.6. The techniques used in these two steps heavily depend on some existing work, particularly, the work by Zhao & Wu (2008); Liu & Wu (2010), which eventually applied the work by Bickel & Rosenblatt (1973); Rosenblatt (1976).

7.1.1. Decomposition

First, we substitute $X_{i}=\mu(X_{i-1})+\epsilon_{i}$ into $t_{n}(x)$ and separate the terms involving $K$ and $K^{*}$ . We first focus on the term involving $K$ only. That is,

[TABLE]

Next it is easy to see that by the definition of $w(x,b)$ , the second term of the decomposition on the right hand side of Eq. 41 equals $\mu(x)$ . For the first term of the decomposition in Eq. 41, following exactly the proof of (Liu & Wu, 2010, Lemma 5.2), uniformly over $x$ , we have that

[TABLE]

where $\tau_{n}:=\sqrt{\frac{b\log n}{n}}+b^{4}+\frac{b}{n}\sqrt{\sum_{k=-n}^{\infty}(\Theta_{n+k}-\Theta_{k})^{2}}$ comes from (Zhao & Wu, 2008, Lemma 2(ii)), and in the last equality we have applied the assumptions on $b$ and $\sum_{k=-n}^{\infty}(\Theta_{n+k}-\Theta_{k})^{2}$ to get $\frac{b}{n}\sqrt{\sum_{k=-n}^{\infty}(\Theta_{n+k}-\Theta_{k})^{2}}=\mathcal{O}(\sqrt{b\log n/n})$ .

7.1.2. $m$ -dependent approximation

For the third term of the decomposition in Eq. 41, recalling that we have defined the notation $\xi_{k_{1},k_{2}}:=(\eta_{k_{1}},\dots,\eta_{k_{2}})$ , we consider the decomposition of $\epsilon_{k}$ ,

[TABLE]

where $m=\lfloor n^{\tau}\rfloor$ where $\tau<1-\delta_{1}$ . The first and last terms in the decomposition can be ignored comparing to the second term. To see this, consider

[TABLE]

which implies $\|\mathbb{E}[\epsilon_{k}\mid\xi_{k-1,k-m}]\|_{p}=\mathcal{O}\left(\sum_{i=m}^{\infty}\rho^{i}\right)=\mathcal{O}(\rho^{m})$ . Since $m>(\log n)^{2}$ , we have

[TABLE]

Similarly, one can verify in the same way that

[TABLE]

Furthermore, since the martingale differences are uncorrelated, we have

[TABLE]

Therefore, defining

[TABLE]

we have

[TABLE]

Next, following exactly the proof of (Liu & Wu, 2010, Lemma 5.3), we get that uniformly over $x$

[TABLE]

Following the above arguments again we can compute the orders for the decomposition of the term involving $K^{*}$ and get $t_{n}(x)$ by the differences. Note that many terms such as $\mu(x)$ in the second term and $\mathcal{O}(b^{2})$ term in the first term cancel out. Therefore, overall it can be easily verified that

[TABLE]

where $\tilde{K}(\cdot)$ is an anti-symmetric kernel defined by

[TABLE]

Now to prove Theorem 3.2, it suffices to show

[TABLE]

where

[TABLE]

Note that we have $\mathbb{E}[\zeta_{i}]=0$ and $\mathbb{E}[\zeta_{i}^{2}]=1$ . Next, we define a truncated version of $\zeta_{i}$ by

[TABLE]

We next define $\tilde{M}_{n}(x)$ using $m$ -dependent conditional expectations

[TABLE]

where $\breve{\sigma}^{2}:=\mathbb{E}\breve{\zeta}_{1}^{2}$ .

7.1.3. Alternating big and small blocks

Recall that $m=\lfloor n^{\tau}\rfloor$ . We choose $\tau_{1}$ such that $\tau<\tau_{1}<1-\delta_{1}$ and split $[1,n]$ into alternating big and small blocks $H_{1},I_{1},\cdots,H_{\iota_{n}},I_{\iota_{n}},I_{\iota_{n}+1}$ with length $|H_{i}|=\lfloor n^{\tau_{1}}\rfloor$ , $|I_{i}|=\lfloor n^{\tau}\rfloor$ , $\forall 1\leq i\leq\iota_{n}$ , and $|I_{\iota_{n}+1}|=n-\iota_{n}(\lfloor n^{\tau_{1}}\rfloor+\lfloor n^{\tau}\rfloor)$ . Note that $\iota_{n}=\lfloor n/(\lfloor n^{\tau_{1}}\rfloor+\lfloor n^{\tau}\rfloor)\rfloor$ . Then we define

[TABLE]

Then we define

[TABLE]

Next we show in the following that we can approximate $M_{n}(x)$ by $\tilde{M}_{n}(x)$ and then approximate $\tilde{M}_{n}(x)$ by $\widetilde{M}_{n}(x)$ . That is, we show

[TABLE]

To show LABEL:{lemma_approximate_Mn}, we first follow the proof of (Liu & Wu, 2010, Lemma 5.1) using Freedman’s inequality for martingale differences Freedman (1975) to get

[TABLE]

which implies we can approximate $M_{n}(x)$ by replacing $\zeta_{k}$ with $\breve{\zeta}_{k}$ in the definition of $M_{n}(x)$ .

Next, we write $K\left(\frac{X_{k-1}-x}{b}\right)$ as a sum of three terms

[TABLE]

Note that $\breve{\zeta}_{k}$ is uncorrelated with the second term of the right hand side of Eq. 64. Next, we show that under our assumptions on physical dependence measure, the first term of the right hand side of Eq. 64 becomes very small for large $m$ . In order to rigorously prove this fact, defining

[TABLE]

we first approximate $\sum_{k=1}^{n}Z_{k}(x)$ by the skeleton process $\sum_{k=1}^{n}Z_{k}(x_{j}),1\leq j\leq q_{n}$ , where $q_{n}=\lfloor n^{2}/b\rfloor$ and $x_{j}=j/(bq_{n})$ . Following the same arguments as in (Liu & Wu, 2010, Proof of Lemma 4.2) using Freedman’s inequality for martingale differences Freedman (1975), we have

[TABLE]

Next, we show $\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{x\in T}\mathbb{E}|Z_{k}(x)|$ exponentially decays with $m$ . We consider two cases $|X_{k-1}-\mathbb{E}(X_{k-1}\,|\,\xi_{k-1,k-m})|\geq\rho_{1}^{m}$ and $|X_{k-1}-\mathbb{E}(X_{k-1}\,|\,\xi_{k-1,k-m})|<\rho_{1}^{m}$ , where $\rho_{1}=\frac{1+\rho}{2}$ . Using the assumption $\theta_{n,p}=\mathcal{O}(\rho^{n})$ , we have

[TABLE]

Now, we can show the maximum of the skeleton process over $\{x_{j}\},j=1,\dots,q_{n}$ is small. Recall that $m$ is a polynomial of $n$ , then we have

[TABLE]

Next, we show the third term of the decomposition of $K\left(\frac{X_{k-1}-x}{b}\right)$ in Eq. 64 can also be ignored. In order to show this, we define

[TABLE]

Using the same argument as in (Liu & Wu, 2010, Proof of Lemma 4.2), we can approximate $N_{n}(x)$ by its skeleton process, since $\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{x_{j-1}\leq x\leq x_{j}}\left|N_{n}(x)-N_{n}(x_{j})\right|=o_{\mathbb{P}}(\log n)^{-2}$ . We first approximate $\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{x}|N_{n}(x)|$ by the maximum over the skeleton process. Then we have $\mathbb{P}\left(\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{1\leq j\leq q_{n}}|N_{n}(x_{j})|\geq(\log n)^{-2}\right)=o(1)$ using Freedman’s inequality for martingale differences Freedman (1975). Therefore, we can approximate $M_{n}(x)$ by

[TABLE]

Furthermore, since $|1-\mathbb{E}[\breve{\zeta}_{k}^{2}]/\mathbb{E}[\zeta_{k}^{2}]|=\mathcal{O}((\log n)^{-12/(p-2)})$ , we can replace $\breve{\zeta}_{k}/\mathbb{E}[\zeta_{k}^{2}]$ by $\breve{\zeta}_{k}/\breve{\sigma}^{2}$ , which leads to the definition of $\tilde{M}_{n}(x)$ . Therefore, we have proved

[TABLE]

Therefore, in order to finish the proof of Eq. 62, it suffices to show

[TABLE]

where $R_{n}(x):=\frac{1}{nb}\sum_{j\in\cup_{i=1}^{\iota_{n}+1}I_{i}}u_{j}(x)$ . Following the same argument as above using skeleton process, we only need to consider the grids $\{x_{j},j=0,\dots,q_{n}\}$ . Using the fact that $\tau<\tau_{1}$ and $n^{-\delta_{1}}=\mathcal{O}(b)$ , again by Freedman’s inequality for martingale differences, for some constant $C$ that

[TABLE]

which finishes the proof of Eq. 62.

Observing that, since $K(\cdot)$ is supported on $[0,1]$ , one of the following two terms must be zero:

[TABLE]

Hence, defining $\widetilde{M}_{n}^{*}(x)$ similarly as $\widetilde{M}_{n}(x)$ using $K^{*}(\cdot)$ instead of $K(\cdot)$ , by Eq. 62, we only need to focus on the following term

[TABLE]

Clearly, in order to complete the proof of Theorem 3.2, it suffices to show

[TABLE]

7.1.4. Asymptomatic covariance structure

Next, we prove some results on the asymptomatic covariance structure of $\{\hat{M}_{n}(x)\}$ which will be needed later for Gaussian approximation using the results in Bickel & Rosenblatt (1973). Define the following quantities: $r(s):=\int K(x)K(x+s)\mathrm{d}x/\lambda_{K}$ , $\hat{r}(s):=\mathbb{E}\hat{M}_{n}(x)\hat{M}_{n}(x+s)$ , $\tilde{r}(s):=\int\tilde{K}(x)\tilde{K}(x+s)\mathrm{d}x/\lambda_{\tilde{K}}$ , and $\tilde{K}_{2}:=\int_{-1}^{1}(\tilde{K}^{\prime}(x))^{2}\mathrm{d}x/(2\lambda_{\tilde{K}})$ . Note that since $\tilde{K}^{\prime}(0)>0$ , we have $\int K(u)K^{*}(u\pm s)\mathrm{d}u=\mathcal{O}(\int_{0}^{|s|}x(|s|-x)\mathrm{d}x)=\mathcal{O}(|s|^{3})=o(|s|^{2})$ . Then by the definition of $\tilde{r}(s)$ , using $\lambda_{\tilde{K}}=2\lambda_{K}$ , we have

[TABLE]

Next, according to (Bickel & Rosenblatt, 1973, Theorems B1 and B2), we have $r(s)=1-K_{2}|s|^{2}+o(|s|^{2})$ . Note that

[TABLE]

This implies $\tilde{r}(s)=1-\tilde{K}_{2}|s|^{2}+o(|s|^{2})$ , which can also be obtained directly from (Bickel & Rosenblatt, 1973, Theorems B1 and B2).

Next, we show $\hat{r}(s)=\tilde{r}(s)+\mathcal{O}(b)$ . Note that $\{\breve{\zeta}_{k}\}$ are uncorrelated and $\mathbb{E}{\breve{\zeta}_{k}}=0$ . Then, using $|f(v+s)-\sqrt{f(t)f(s)}|=\mathcal{O}(b)$ uniformly over $|s-t|\leq 2b$ and $|v|\leq 2b$ , we have

[TABLE]

Therefore, we have proved that, as $s\to 0$ ,

[TABLE]

7.1.5. Gaussian approximation

Now, we go back to prove Eq. 77. We use similar techniques as in (Liu & Wu, 2010, Proof of Lemma 4.5). First, as in Bickel & Rosenblatt (1973), we split the interval $T$ into alternating big and small intervals $W_{1},V_{1},\dots,W_{N},V_{N}$ , where $W_{i}=[a_{i},a_{i}+w]$ , $V_{i}=[a_{i}+w,a_{i+1}]$ , $a_{i}=(i-1)(w+v)$ , and $N=\lfloor(u-l)/(w+v)\rfloor$ . We let $w$ be fixed, and $v$ be small which goes to [math]. Since $u$ and $l$ are fixed numbers, without loss of generality, we assume $l=0$ and $u=1$ in this proof.

Next, we approximate $\Omega^{+}:=\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{0\leq t\leq 1}\hat{M}_{n}(t)$ by big blocks $\{W_{k}\}$ . That is, by $\Psi^{+}:=\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{1\leq k\leq N}\Upsilon^{+}_{k}$ , where $\Upsilon^{+}:=\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{t\in W_{k}}\hat{M}_{n}(t)$ . Then we further approximate $\Upsilon^{+}_{k}$ via discretization by $\Xi_{k}^{+}:=\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{1\leq j\leq\chi}\hat{M}_{n}(a_{k}+jax^{-1})$ , where $\chi=\lfloor wx/a\rfloor$ with $a>0$ . We define $\Omega^{-}$ , $\Psi^{-}$ , $\Upsilon^{-}_{k}$ , and $\Xi_{k}^{-}$ similarly by replacing $\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}$ or $\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}$ by $\operatorname*{\mathrm{inf}\vphantom{\mathrm{infsup}}}$ or $\operatorname*{\mathrm{min}\vphantom{\mathrm{infsup}}}$ , respectively. Letting $\Omega=\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}(\Omega^{+},-\Omega^{-})=\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{0\leq t\leq 1}|\hat{M}_{n}(t)|$ and $x_{z}=d_{n}+z/(2\log b^{-1})^{1/2}$ , we have

[TABLE]

where

[TABLE]

Next, we are ready to apply Gaussian approximation. We first use discretization for approximating $\hat{M}_{n}(x)$ . Let $s_{j}=j/(\log n)^{6},1\leq j<t_{n}$ , where $t_{n}=1+\lfloor(\log n)^{6}t\rfloor$ , $s_{t_{n}}=t$ . Write $[s_{j-1},s_{j}]=\bigcup_{k=1}^{q_{n}}[s_{j,k-1},s_{j,k}]$ , where $q_{n}=\lfloor(s_{j}-s_{j-1})n^{2}\rfloor=\lfloor n^{2}/(\log n)^{6}\rfloor$ and $s_{j,k}-s_{j,k-1}=(s_{j}-s_{j-1})/q_{n}$ . Following the same arguments as in (Liu & Wu, 2010, Proof of Lemma 4.6), we have the following discretization approximation holds for all large enough $Q$ ,

[TABLE]

Next, we apply the multivariate Gaussian approximation from Zaitsev (1987). To this end, similar to the definition of $u_{j}(t)$ , we first define

[TABLE]

Note that the sequence of random variables $\{\tilde{u}_{j}(t),j=1,\cdots,\iota_{n}\}$ are independent. Then we define

[TABLE]

Now we introduce $\widehat{M}_{n}(t):=\frac{1}{\sqrt{nb\lambda_{\tilde{K}}f(t)}}\sum_{j=1}^{\iota_{n}}\widehat{u}_{j}(t)$ . Then using (Zaitsev, 1987, Theorem 1.1) as well as $\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{t}\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{1\leq j\leq\iota_{n}}\|\widehat{u}_{j}(t)-\tilde{u}_{j}(t)\|\leq Cn^{-Q}$ for large enough $Q$ , we have

[TABLE]

where $(Y_{n}(1),\dots,Y_{n}(t_{n}))$ is a centered Gaussian random vector with covariance matrix $\widehat{\Sigma}_{n}=\textrm{Cov}(\widehat{M}_{n}(v+s_{1}),\dots,\widehat{M}_{n}(v+s_{t_{n}}))$ .

Let $\psi$ be the density function of standard Gaussian, and $H_{2}(a)$ be the Pickands constants (Bickel & Rosenblatt, 1973, Theorem A1, Lemma A1, and Lemma A3). Using Eq. 85, let $t>0$ be such that $\operatorname*{\mathrm{inf}\vphantom{\mathrm{infsup}}}\{s^{-2}(1-\tilde{r}(s)):0\leq s\leq t\}>0$ . Following exactly the arguments in (Liu & Wu, 2010, Proof of Lemma 4.6) to apply (Bickel & Rosenblatt, 1973, Lemma A3 and Lemma A4), we can obtain that for $a>0$ ,

[TABLE]

uniformly over $0\leq v\leq 1$ . The limit when $a\to 0$ also holds, that is

[TABLE]

where we have used the Pickands constants $H_{2}=\operatorname*{\mathrm{lim}\vphantom{\mathrm{infsup}}}_{a\to 0}H_{2}(a)/a=1/\sqrt{\pi}$ . The left tail version of the tail bounds also holds with $\geq x$ replaced by $\leq x$ . Furthermore, we can show through elementary calculations that

[TABLE]

Therefore, it suffices to show the following convergence to Gumbel law

[TABLE]

7.1.6. Convergence to Gumbel distribution

The main steps of the proof of Eq. 100 are as follows. First, we approximate $\hat{M}_{n}(t)$ by $Y_{n}(t)$ . Then, we approximate $Y_{n}(t)$ by another quantity $\hat{M}_{n}^{\prime}(t)$ which is defined similarly to $\hat{M}_{n}(x)$ but using a sequence of i.i.d. random variables instead of the dependent time series $\{X_{k}\}$ . Finally, we apply (Rosenblatt, 1976, Theorem) to show convergence to Gumbel distribution.

We define

[TABLE]

where $Y_{n}(\cdot)$ is a centered Gaussian process with covariance function

[TABLE]

First we approximate $\hat{M}_{n}(t)$ using $Y_{n}(t)$ . Recall that $w$ and $v$ are the lengths of big and small blocks $W_{i}$ and $V_{i}$ . Let $N=\lfloor 1/(w+v)\rfloor$ . Define a different truncation order for $M_{n}(t)$ by $\widehat{M}^{\prime}_{n}(t):=\frac{1}{\sqrt{nb\lambda_{\tilde{K}}f(t)}}\sum_{j=1}^{\iota_{n}}\widehat{u}^{\prime}_{j}(t)$ for given $d$ , where

[TABLE]

Then using $\widehat{M}^{\prime}_{n}(t)$ and following exactly the same proof from (Liu & Wu, 2010, Proof of Lemma 4.10) to get that, for any fixed integer $l$ that $1\leq l\leq N/2$ ,

[TABLE]

where $\mathbf{A}_{k}:=\bigcup_{j=1}^{\lfloor wx/a\rfloor}\mathbf{B}_{k,j}$ , $\mathbf{C}_{k}:=\bigcup_{j=1}^{\lfloor wx/a\rfloor}\mathbf{D}_{k,j}$ , $C$ does not depend on $l$ , and

[TABLE]

Next, we construct $\hat{M}_{n}^{\prime}(t)$ in the following way. Let $\{\eta_{i}^{(k)}\},i\leq k\leq n,$ be i.i.d. copies of $\{\eta_{i}\}$ , and $\xi_{j}^{(k)}=(\dots,\eta_{j-1}^{(k)},\eta_{j}^{(k)})$ . Let $X_{i}^{(k)}=G(\xi_{j}^{(k)})$ . Note that $X_{k}^{(k)},1\leq k\leq n,$ are i.i.d. Now define $\mathbf{A}_{k}^{\prime}$ the same as $\mathbf{A}_{k}$ except by replacing $Y_{j}$ and $\{\eta_{i}\}$ with $X_{k}^{(k)}$ and $\{\eta_{i}^{(k)}\}$ , respectively. Repeat the previous arguments for getting Eq. 104, we have

[TABLE]

Letting $n\to\infty$ then $l\to\infty$ , by triangle inequality, we have

[TABLE]

Now the key observation is that we can deal with $\{\mathbf{A}_{k}^{\prime}\}$ now and $\mathbf{A}_{k}^{\prime}$ are defined using $\{X_{k}^{(k)}\}$ which are i.i.d. Next, we define $R_{1}^{\prime}$ to $R_{4}^{\prime}$ the same as $R_{1}$ to $R_{4}$ except using $\{X_{k}^{(k)}\}$ and $\{\eta_{i}^{k}\}$ instead of $\{X_{k}\}$ and $\{\eta_{i}\}$ , then by Eq. 98 and elementary calculations again we have $\operatorname*{\mathrm{lim}\vphantom{\mathrm{infsup}}}_{a\to 0}\limsup_{v\to 0}\limsup_{n\to\infty}R_{j}^{\prime}=0$ for $j=1,\dots,4$ . This implies

[TABLE]

where $\hat{M}_{n}^{\prime}(t)$ is defined in the same way as $\hat{M}_{n}(t)$ by replacing $\{X_{k}\}$ with $\{X_{k}^{(k)}\}$ , and $\{\eta_{i}\}$ with $\{\eta_{i}^{(k)}\}$ . Finally, since $\{X_{k}^{(k)}\}$ are i.i.d., we can apply (Rosenblatt, 1976, Theorem), which leads to the convergence of $\mathbb{P}\left(\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{0\leq t\leq 1}\left|\hat{M}_{n}^{\prime}(t)\right|<x_{z}\right)$ to $e^{-2e^{-z}}$ . This completes the proof of Theorem 3.2.

7.2. Proof of Theorem 3.5

First, let $r_{n}$ and $s_{n}$ be positive sequences, then $r_{n}=\Omega(s_{n})$ if $s_{n}=o(r_{n})$ . On the other hand, $r_{n}=\Theta(s_{n})$ if both $s_{n}=\mathcal{O}(r_{n})$ and $r_{n}=\mathcal{O}(s_{n})$ hold. Note that

[TABLE]

We first argue that $\mathbb{P}\left(\hat{M}<M\right)\to 0$ , which implies at least one change point hasn’t been detected, then we can write

[TABLE]

Then, by the validity of the bootstrap procedure, when $\sqrt{\frac{b\log n}{n}}=o(\tilde{\Delta}_{n})$ , the power of the test goes to $1$ as $n\to\infty$ which implies that for any $i$ ,

[TABLE]

This conclude that $\mathbb{P}\left(\hat{M}<M\right)\to 0$ .

Next we argue that $\mathbb{P}\left(\hat{M}>M\right)\to\alpha$ . Note that $\hat{M}>M$ implies there is a set $\tilde{T}$ without any change point in it, however, $\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{x\in\tilde{T}}|t_{n}(x)|\geq C_{n,\alpha}$ . Note that by our algorithm, we can consider $\tilde{T}$ to be the largest set constructed by ruling out $M$ intervals from $[l,u]$ such that each interval has length $2b$ and contains one change point. Then since $M$ is a fixed constant and $b\to 0$ , we have $|\tilde{T}|=(|u-l|-2Mb)^{+}\to|u-l|$ . Then we can apply our main result Theorem 3.2 again on $\tilde{T}$ to get that $\mathbb{P}\left(\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{x\in\tilde{T}}|t_{n}(x)|\geq C_{n,\alpha}\right)\to\alpha$ , which implies $\mathbb{P}\left(\hat{M}>M\right)\to\alpha$ .

Therefore, we have $\mathbb{P}\left(\hat{M}=M\right)\to 1-\alpha$ . Then it suffices to show

[TABLE]

Since $M$ is finite, we only need to focus on one change point. Let $x_{0}$ be any of the true change point and $\hat{x}$ be its estimate, it suffices to show $\mathbb{P}\left(|\hat{x}-x_{0}|\geq c_{n}\mid\hat{M}=M\right)\to 0$ . Without loss of generality, we assume $\hat{x}-x_{0}=\hat{c}_{n}=o_{\mathbb{P}}(b)$ and $t_{n}(x_{0})>0$ . The case $t_{n}(x_{0})<0$ can be shown using similar arguments. Now we follow similar arguments as in Müller (1992). Define $\zeta(c):=t_{n}(x_{0}+c)-t_{n}(x_{0})$ , for $c=o(b)$ . Then we can write $\hat{c}_{n}=\arg\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}\zeta(c)$ . Therefore, it suffices to show $\hat{c}_{n}=\mathcal{O}_{\mathbb{P}}\left(\frac{1}{\tilde{\Delta}_{n}}\sqrt{\frac{b\log n}{n}}\right)$ . Suppose $b$ is small enough such that the $b$ -neighborhood of $x_{0}$ does not include any other change points, then we apply the previous decomposition in Eq. 41. Note that since $x_{0}$ is a change point, without loss of generality, we assume $\mu(x)$ is left continuous at $x=x_{0}$ , then the following term has the order of $\Theta(\tilde{\Delta}_{n})$ :

[TABLE]

Furthermore, using $\int_{0}^{s}K(x)\mathrm{d}x=\Theta(s^{2})$ because of $\tilde{K}^{\prime}(0)>0$ , considering cases $|X_{k-1}-x_{0}|\in[0,c]$ and $|X_{k-1}-x_{0}|\in(c,b]$ separately, we have

[TABLE]

Finally, by the assumptions on $K^{\prime}$ in Theorem 3.5, we can follow the same arguments in the proof of Theorem 3.2 as the $m$ -dependent approximation Section 7.1.2 and alternating big/small blocks Section 7.1.3 applying to $\tilde{K}^{\prime}$ instead of $\tilde{K}$ to get

[TABLE]

Furthermore, using the fact that $|\tilde{K}^{\prime\prime}(u)|$ is uniformly bounded and mean value theorem, we have

[TABLE]

Next, we define a new kernel $\check{K}$ such that

[TABLE]

so we have $\mathbb{E}\left[\frac{1}{f(x)}\check{K}\left(\frac{X_{k-1}-x}{b}\right)^{2}\right]=\mathcal{O}(b)$ . Then we can approximate the following term using the same arguments of $m$ -dependent approximation and alternating big/small blocks as in Section 7.1.2 and Section 7.1.3 in the proof of Theorem 3.2 applying to this new kernel $\check{K}$ to get

[TABLE]

Therefore, we have

[TABLE]

Then using $\sqrt{\frac{\log n}{nb}}=o(\tilde{\Delta}_{n})$ we can conclude that

[TABLE]

Recall that the estimated change point $\hat{x}=x_{0}+\hat{c}_{n}$ , where $\hat{c}_{n}=\arg\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}\zeta(c)$ , then we have

[TABLE]

whenever $b^{4}=o((\log n)/(n\tilde{\Delta}_{n}))$ and $b^{3}=o((\log n)/(n\tilde{\Delta}_{n}^{2}))$ . This is always true since we have assumed $\delta_{2}\leq 1/4$ which implies $b=\mathcal{O}(n^{-1/4})$ so $b^{4}=\mathcal{O}(1/n)=o((\log n)/n)$ . Therefore, if we choose $c_{n}>0$ such that $\hat{c}_{n}=o(c_{n})$ , then we have $\mathbb{P}(|\hat{c}_{n}|<c_{n})\to 0$ , which implies $\mathbb{P}\left(|\hat{x}-x_{0}|\geq c_{n}\mid\hat{M}=M\right)\to 0$ .

8. Additional proofs

8.1. Proof of

Remark 3.1

For $\sigma_{n}^{2}(x)$ , we first write it as the sum of three terms:

[TABLE]

For the first term, we first approximate $\epsilon_{k}^{2}$ by $\{\mathbb{E}[\epsilon_{k}^{2}\mid\xi_{k,k-m}]\}$ where $m=\lfloor n^{\tau}\rfloor$ with $\tau>0$ small enough. Using the same argument as in Section 7.1, we have

[TABLE]

where we choose $m=c\log n$ with $c>-\frac{1}{2\log(\rho)}$ . We then divide $1,\dots,m$ into $\lfloor n/m\rfloor+1$ blocks indexed by $1,\cdots,\lfloor n/m\rfloor+1$ . Then it’s clear that the sum of blocks with odd indices is independent with the sum of blocks with even indices. Following the same argument as the proof of (Liu & Wu, 2010, Theorem 2.5) for each subsequence of the blocks, and use a union bound, we can get

[TABLE]

For the second term, we first approximate $\{\epsilon_{k}\}$ using $\{\epsilon_{k}^{\prime}\}$ , where $\epsilon_{k}^{\prime}:=\mathbb{E}[\epsilon_{k}\mid\xi_{k,k-m}]-\mathbb{E}[\epsilon_{k}\mid\xi_{k-1,k-m}]$ . Then following the same argument as in Section 7.1 we have

[TABLE]

Then, again choosing $m=c\log n$ and divide $1,\cdots,n$ into $\lfloor n/m\rfloor+1$ blocks, by the same argument as in (Zhao & Wu, 2008, pp. 1875), we can get

[TABLE]

Finally, for the last term, we have

[TABLE]

Then, using $0<\delta_{1}<1/4$ we have that

[TABLE]

For $f_{n}(x)$ , similarly, by the same arguments as the proof for $\sigma_{n}^{2}(x)$ , following the proof of (Liu & Wu, 2010, Lemma 4.4), we can obtain $\operatorname*{\mathrm{sup}\vphantom{\mathrm{infsup}}}_{x}\left|f_{n}(x)-f(x)\right|=\mathcal{O}_{\mathbb{P}}\left(\frac{(\log n)^{3}}{\sqrt{nh}}+h^{2}\log n\right)$ .

8.2. Proof of Proposition 4.1

Since $\{U_{k}\}_{k=0}^{n}$ are i.i.d. standard Gaussian distributed random variables, the proof for this proposition is simpler than Theorem 3.2. We can immediately prove the convergence to Gumbel distribution by using (Rosenblatt, 1976, Theorem 1).

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Bickel & Rosenblatt (1973) Bickel, P. J. & Rosenblatt, M. (1973), ‘On some global measures of the deviations of density function estimates’, The Annals of Statistics 1 (6), 1071–1095.
3Bollerslev et al. (2008) Bollerslev, T., Law, T. H. & Tauchen, G. (2008), ‘Risk, jumps, and diversification’, Journal of Econometrics 144 (1), 234–256.
4Chapman & Pearson (2000) Chapman, D. A. & Pearson, N. D. (2000), ‘Is the short rate drift actually nonlinear?’, The Journal of Finance 55 (1), 355–388.
5Durlauf & Johnson (1995) Durlauf, S. N. & Johnson, P. A. (1995), ‘Multiple regimes and cross-country growth behaviour’, Journal of Applied Econometrics 10 (4), 365–384.
6Engle (1982) Engle, R. F. (1982), ‘Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation’, Econometrica 50 (4), 987–1007.
7Eubank & Speckman (1994) Eubank, R. & Speckman, P. (1994), ‘Nonparametric estimation of functions with jump discontinuities’, Lecture Notes-Monograph Series pp. 130–144.
8Fan & Yao (2003) Fan, J. & Yao, Q. (2003), ‘Nonlinear time series: nonparametric and parametric methods’, Springer.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

State-domain Change Point Detection for Nonlinear Time Series Regression

Abstract.

1. Introduction

2. Model Formulation and Basic Assumptions

3. State-domain Change Point Detection and Estimation

3.1. Test for the existence of change points.

Remark 3.1*.*

Theorem 3.2**.**

Proof.

Corollary 3.3**.**

Remark 3.4*.*

3.2. Change-point Estimation

Theorem 3.5**.**

Proof.

4. Practical Implementation

4.1. The bootstrap procedure

Proposition 4.1**.**

4.2. Bandwidth selection

5. Simulation Study

5.1. Accuracy of bootstrap.

5.2. Power of hypothesis testing

5.3. Accuracy in estimating the number of change points and their locations

5.4. Comparison to threshold testing and estimation in threshold model

6. Illustrative example

7. Proofs of main results

7.1. Proof of Theorem 3.2

7.1.1. Decomposition

7.1.2. mmm-dependent approximation

7.1.3. Alternating big and small blocks

7.1.4. Asymptomatic covariance structure

7.1.5. Gaussian approximation

7.1.6. Convergence to Gumbel distribution

7.2. Proof of Theorem 3.5

8. Additional proofs

8.1. Proof of

8.2. Proof of Proposition 4.1

*Remark 3.1**.*

Theorem 3.2.

Corollary 3.3.

*Remark 3.4**.*

Theorem 3.5.

Proposition 4.1.

7.1.2. $m$ -dependent approximation