A new approach for open-end sequential change point monitoring

Josua G\"osmann; Tobias Kley; Holger Dette

arXiv:1906.03225·math.ST·July 28, 2020

A new approach for open-end sequential change point monitoring

Josua G\"osmann, Tobias Kley, Holger Dette

PDF

TL;DR

This paper introduces a novel sequential change point detection method for multivariate time series that improves power and flexibility by continuously comparing estimators before and after each potential change point, outperforming existing methods.

Contribution

The paper presents a new open-end sequential monitoring scheme that compares estimators at every time point, offering an asymptotic level $oldsymbol{ extit{ extalpha}}$ procedure with improved detection power.

Findings

01

The new method outperforms existing procedures in simulation studies.

02

It maintains an asymptotic level $ extalpha$ under no change.

03

The approach is demonstrated on real data examples.

Abstract

We propose a new sequential monitoring scheme for changes in the parameters of a multivariate time series. In contrast to procedures proposed in the literature which compare an estimator from the training sample with an estimator calculated from the remaining data, we suggest to divide the sample at each time point after the training sample. Estimators from the sample before and after all separation points are then continuously compared calculating a maximum of norms of their differences. For open-end scenarios our approach yields an asymptotic level $α$ procedure, which is consistent under the alternative of a change in the parameter. By means of a simulation study it is demonstrated that the new method outperforms the commonly used procedures with respect to power and the feasibility of our approach is illustrated by analyzing two data examples.

Tables7

Table 1. Table 1: (1- α 𝛼 \alpha )-quantiles of the distributions L 1 , γ subscript 𝐿 1 𝛾 L_{1,\gamma} , L 2 , γ subscript 𝐿 2 𝛾 L_{2,\gamma} and L 3 , γ subscript 𝐿 3 𝛾 L_{3,\gamma} for different choices of γ 𝛾 \gamma and different dimensions of the parameter. The cutoff constant was set to ε = 0 𝜀 0 \varepsilon=0 . The results for L 2 , γ subscript 𝐿 2 𝛾 L_{2,\gamma} and L 3 , γ subscript 𝐿 3 𝛾 L_{3,\gamma} for p = 1 𝑝 1 p=1 are taken from Horváth et al., ( 2004 ) and Fremdt, ( 2015 ) , respectively. The quantiles for L 1 , 0 subscript 𝐿 1 0 L_{1,0} for p = 1 𝑝 1 p=1 were computed with respect to formula ( 2.15 ).

		$L_{1, γ}$			$L_{2, γ}$			$L_{3, γ}$
p	$γ$ \ $α$	0.01	0.05	0.1	0.01	0.05	0.1	0.01	0.05	0.1
1	0	3.0233	2.4977	2.2412	2.7912	2.2365	1.9497	2.8262	2.2599	1.9914
	0.25	3.1050	2.5975	2.3542	2.9445	2.3860	2.1060	2.9638	2.4296	2.1758
	0.45	3.4269	2.9701	2.7398	3.3015	2.7992	2.5437	3.3817	2.9241	2.7002
2	0	3.4022	2.8943	2.6562	3.2272	2.6794	2.4008	3.2461	2.6957	2.4266
	0.25	3.5279	3.0948	2.7781	3.3322	2.7981	2.5481	3.3630	2.8433	2.5911
	0.45	3.8502	3.3912	3.1509	3.7010	3.2046	2.9543	3.7467	3.2966	3.0620

Table 2. Table 2: Type I error for the open-end procedures based on E ^ ^ 𝐸 \hat{E} , Q ^ ^ 𝑄 \hat{Q} and P ^ ^ 𝑃 \hat{P} at 5% nominal size. The size of the known stable data was set to m = 50 𝑚 50 m=50 (upper part), m = 100 𝑚 100 m=100 (lower part).

		(M1)			(M2)
$m$	$γ$	$\hat{E}$	$\hat{Q}$	$\hat{P}$	$\hat{E}$	$\hat{Q}$	$\hat{P}$
50	$0$	4.8%	5.3%	5.3%	8.4%	8.8%	9.0%
	$0.25$	5.0%	5.0%	5.3%	8.9%	8.4%	8.3%
	$0.45$	4.5%	4.4%	3.9%	7.5%	7.4%	6.4%
100	$0$	4.1%	4.4%	4.6%	6.8%	6.3%	6.6%
	$0.25$	5.0%	5.4%	5.6%	7.3%	6.7%	6.9%
	$0.45$	6.0%	6.2%	5.2%	7.0%	6.4%	6.0%

Table 3. Table 3: Type I error for the open-end procedures based on E ^ ^ 𝐸 \hat{E} , Q ^ ^ 𝑄 \hat{Q} and P ^ ^ 𝑃 \hat{P} at 5% nominal size. The size of the known stable data was set to m = 100 𝑚 100 m=100 (upper part), m = 200 𝑚 200 m=200 (middle part) and m = 400 𝑚 400 m=400 (lower part). In brackets we report the result of simulations, in which the long-run variance estimator has been replaced by the true long-run variance.

		(M3)			(M4)
$m$	$γ$	$\hat{E}$	$\hat{Q}$	$\hat{P}$	$\hat{E}$	$\hat{Q}$	$\hat{P}$
100	$0$	12.1% (3.2)	11.4% (4.5)	11.8% (4.4)	16.3% (2.9)	15.0% (4.1)	15.9% (4.0)
	$0.25$	13.7% (3.2)	11.8% (3.8)	12.5% (3.8)	18.1% (3.1)	16.3% (3.5)	17.3% (3.5)
	$0.45$	13.2% (2.6)	12.2% (2.8)	11.6% (2.3)	16.6% (2.1)	14.3% (2.3)	13.9% (1.8)
200	$0$	7.6% (3.0)	8.1% (4.0)	8.4% (4.1)	9.4% (2.6)	10.5% (3.5)	10.6% (3.5)
	$0.25$	8.4% (3.5)	7.6% (4.0)	8.3% (4.1)	11.4% (3.0)	11.4% (3.8)	11.6% (3.8)
	$0.45$	8.7% (3.2)	8.1% (3.2)	7.4% (2.8)	11.3% (2.6)	10.6% (2.7)	10.2% (2.4)
400	$0$	5.0% (2.8)	6.0% (3.5)	6.2% (3.4)	7.3% (2.6)	7.8% (3.6)	8.2% (3.4)
	$0.25$	6.2% (3.5)	6.2% (4.0)	6.3% (4.1)	8.2% (3.1)	8.1% (3.8)	8.7% (3.6)
	$0.45$	6.9% (3.1)	6.2% (3.3)	5.7% (2.9)	7.8% (2.9)	7.9% (2.9)	7.0% (2.6)

Table 4. Table 4: Type I error for the open-end procedures based on E ^ ^ 𝐸 \hat{E} , Q ^ ^ 𝑄 \hat{Q} and P ^ ^ 𝑃 \hat{P} at 5% nominal size. The size of the known stable data was set to m = 100 𝑚 100 m=100 .

	(LM1)			(LM2)
$γ$	$\hat{E}$	$\hat{Q}$	$\hat{P}$	$\hat{E}$	$\hat{Q}$	$\hat{P}$
$0$	6.4%	6.5%	6.7%	7.2%	6.7%	7.2%
$0.25$	7.6%	8.8%	9.1%	8.5%	9.6%	9.5%
$0.45$	12.0%	12.2%	12.1%	12.6%	12.2%	12.6%

Table 5. Table 5: (1- α 𝛼 \alpha )-quantiles of the distributions L 1 , γ ( 4 ) subscript 𝐿 1 𝛾 4 L_{1,\gamma}(4) , L 2 , γ ( 4 ) subscript 𝐿 2 𝛾 4 L_{2,\gamma}(4) and L 3 , γ ( 4 ) subscript 𝐿 3 𝛾 4 L_{3,\gamma}(4) for different choices of γ 𝛾 \gamma . The cutoff constant was set to ε = 0 𝜀 0 \varepsilon=0 and the dimension is p = 1 𝑝 1 p=1 . The quantiles for L 1 , 0 ( 4 ) subscript 𝐿 1 0 4 L_{1,0}(4) were computed with respect to formula ( C.7 ).

	$L_{1, γ} (4)$			$L_{2, γ} (4)$			$L_{3, γ} (4)$
$γ$ \ $α$	0.01	0.05	0.1	0.01	0.05	0.1	0.01	0.05	0.1
0	2.7042	2.2339	2.0046	2.5145	1.9826	1.7380	2.5572	2.0435	1.8019
0.25	2.9558	2.4345	2.2220	2.7602	2.2223	1.9799	2.8210	2.2986	2.0750
0.45	3.3850	2.9371	2.6994	3.2238	2.7398	2.4952	3.3156	2.8626	2.6274

Table 6. Table 6: Type I error for the closed-end procedures for a change in the mean based on the statistics E ^ ^ 𝐸 \hat{E} , Q ^ ^ 𝑄 \hat{Q} and P ^ ^ 𝑃 \hat{P} at 5% nominal size with a training data set of size m = 200 𝑚 200 m=200 and a monitoring window of T = 4 𝑇 4 T=4 .

	(M1)			(M2)
$γ$	$\hat{E}$	$\hat{Q}$	$\hat{P}$	$\hat{E}$	$\hat{Q}$	$\hat{P}$
$0$	5.0%	5.3%	5.3%	8.0%	7.3%	7.4%
$0.25$	5.4%	6.0%	5.8%	8.6%	7.5%	8.1%
$0.45$	4.9%	5.4%	4.5%	6.1%	6.4%	5.9%

Table 7. Table 7: Type I error for the closed-end procedures for a change in the mean based on the statistics E ^ ^ 𝐸 \hat{E} , Q ^ ^ 𝑄 \hat{Q} and P ^ ^ 𝑃 \hat{P} at 5% nominal size with a training data set of size m = 400 𝑚 400 m=400 and a monitoring window of T = 4 𝑇 4 T=4 .

	(M3)			(M4)
$γ$	$\hat{E}$	$\hat{Q}$	$\hat{P}$	$\hat{E}$	$\hat{Q}$	$\hat{P}$
$0$	7.6%	7.8%	8.1%	9.9%	9.8%	10.1%
$0.25$	8.4%	8.2%	8.7%	11.4%	10.4%	10.8%
$0.45$	6.6%	6.5%	6.8%	8.9%	8.3%	8.5%

Equations300

H_{0}

H_{0}

H_{1}

H_{1}

\hat{θ}_{1}^{m} - \hat{θ}_{m + 1}^{m + k},

\hat{θ}_{1}^{m} - \hat{θ}_{m + 1}^{m + k},

\displaystyle\big{\{}\hat{\theta}_{1}^{m}-\hat{\theta}_{m+j+1}^{m+k}\big{\}}_{j=0,\ldots,k-1}~{}

\displaystyle\big{\{}\hat{\theta}_{1}^{m}-\hat{\theta}_{m+j+1}^{m+k}\big{\}}_{j=0,\ldots,k-1}~{}

\displaystyle\big{\{}\hat{\theta}_{1}^{m+j}-\hat{\theta}_{m+j+1}^{m+k}\big{\}}_{j=0,\ldots,k-1}\;.

\displaystyle\big{\{}\hat{\theta}_{1}^{m+j}-\hat{\theta}_{m+j+1}^{m+k}\big{\}}_{j=0,\ldots,k-1}\;.

\hat{F}_{i}^{j} (z) = \frac{1}{j - i + 1} t = i \sum j I {X_{t} \leq z}

\hat{F}_{i}^{j} (z) = \frac{1}{j - i + 1} t = i \sum j I {X_{t} \leq z}

\displaystyle\hat{E}_{m}(k)=m^{-1/2}\max_{j=0}^{k-1}(k-j)\Big{\|}\hat{\theta}_{1}^{m+j}-\hat{\theta}_{m+j+1}^{m+k}\Big{\|}_{\hat{\Sigma}_{m}^{-1}}~{},

\displaystyle\hat{E}_{m}(k)=m^{-1/2}\max_{j=0}^{k-1}(k-j)\Big{\|}\hat{\theta}_{1}^{m+j}-\hat{\theta}_{m+j+1}^{m+k}\Big{\|}_{\hat{\Sigma}_{m}^{-1}}~{},

w (k / m) \hat{E}_{m} (k) > c (α)

w (k / m) \hat{E}_{m} (k) > c (α)

\hat{D}_{m} (k) = m^{- 3/2} j = 0 max k - 1 (m + j) (k - j) ∥ \hat{θ}_{1}^{m + j} - \hat{θ}_{m + j + 1}^{m + k} ∥_{\hat{Σ}_{m}^{- 1}},

\hat{D}_{m} (k) = m^{- 3/2} j = 0 max k - 1 (m + j) (k - j) ∥ \hat{θ}_{1}^{m + j} - \hat{θ}_{m + j + 1}^{m + k} ∥_{\hat{Σ}_{m}^{- 1}},

k = 1 max m T w (k / m) \hat{D}_{m} (k) ⟹ D t \in [0, T] max w (t) s \in [0, t] max ∣ (s + 1) W (t + 1) - (t + 1) W (s + 1) ∣,

k = 1 max m T w (k / m) \hat{D}_{m} (k) ⟹ D t \in [0, T] max w (t) s \in [0, t] max ∣ (s + 1) W (t + 1) - (t + 1) W (s + 1) ∣,

I F (x, F, θ) = ε ↘ 0 lim \frac{θ (( 1 - ε ) F + ε δ _{x} ) - θ ( F )}{ε},

I F (x, F, θ) = ε ↘ 0 lim \frac{θ (( 1 - ε ) F + ε δ _{x} ) - θ ( F )}{ε},

\hat{θ}_{i}^{j} - θ = θ (\hat{F}_{i}^{j}) - θ (F) = \frac{1}{j - i + 1} t = i \sum j I F (X_{t}, F, θ) + R_{i, j}

\hat{θ}_{i}^{j} - θ = θ (\hat{F}_{i}^{j}) - θ (F) = \frac{1}{j - i + 1} t = i \sum j I F (X_{t}, F, θ) + R_{i, j}

I F_{t} = I F (X_{t}, F_{t}, θ),

I F_{t} = I F (X_{t}, F_{t}, θ),

\displaystyle\sup_{k=1}^{\infty}\dfrac{1}{k^{\xi}}\bigg{|}\sum_{t=m+1}^{m+k}\mathcal{IF}_{t}-\sqrt{\Sigma}W_{m,1}(k)\bigg{|}=\mathcal{O}_{\mathbb{P}}(1)

\displaystyle\sup_{k=1}^{\infty}\dfrac{1}{k^{\xi}}\bigg{|}\sum_{t=m+1}^{m+k}\mathcal{IF}_{t}-\sqrt{\Sigma}W_{m,1}(k)\bigg{|}=\mathcal{O}_{\mathbb{P}}(1)

\displaystyle\dfrac{1}{m^{\xi}}\bigg{|}\sum_{t=1}^{m}\mathcal{IF}_{t}-\sqrt{\Sigma}W_{m,2}(m)\bigg{|}=\mathcal{O}_{\mathbb{P}}(1)

\displaystyle\dfrac{1}{m^{\xi}}\bigg{|}\sum_{t=1}^{m}\mathcal{IF}_{t}-\sqrt{\Sigma}W_{m,2}(m)\bigg{|}=\mathcal{O}_{\mathbb{P}}(1)

w (t) = \tilde{w} (t) I {t_{w} \leq t \leq T_{w}}

w (t) = \tilde{w} (t) I {t_{w} \leq t \leq T_{w}}

i, j = 1 i < j max k \frac{( j - i + 1 )}{k} ∣ R_{i, j} ∣ = o (1)

i, j = 1 i < j max k \frac{( j - i + 1 )}{k} ∣ R_{i, j} ∣ = o (1)

i, j = 1 i < j max k \frac{( j - i + 1 )}{k ^{1/2 - γ}} ∣ R_{i, j} ∣ = o (1) a . s .,

i, j = 1 i < j max k \frac{( j - i + 1 )}{k ^{1/2 - γ}} ∣ R_{i, j} ∣ = o (1) a . s .,

k = 1 sup \infty i, j = 1 i < j max m + k \frac{( j - i + 1 )}{( m + k ) ^{1/2}} ∣ R_{i, j} ∣ = k = m + 1 sup \infty i, j = 1 i < j max k \frac{( j - i + 1 )}{k ^{1/2}} ∣ R_{i, j} ∣ = o (1) a . s . as m \to \infty .

k = 1 sup \infty i, j = 1 i < j max m + k \frac{( j - i + 1 )}{( m + k ) ^{1/2}} ∣ R_{i, j} ∣ = k = m + 1 sup \infty i, j = 1 i < j max k \frac{( j - i + 1 )}{k ^{1/2}} ∣ R_{i, j} ∣ = o (1) a . s . as m \to \infty .

\displaystyle\begin{split}\sup_{k=1}^{\infty}w(k/m)\hat{E}_{m}(k)&\overset{\mathcal{D}}{\Longrightarrow}\sup_{0\leq t<\infty}\max_{0\leq s\leq t}(t+1)w(t)\Big{|}W\Big{(}\dfrac{s}{s+1}\Big{)}-W\Big{(}\dfrac{t}{t+1}\Big{)}\Big{|}~{},\end{split}

\displaystyle\begin{split}\sup_{k=1}^{\infty}w(k/m)\hat{E}_{m}(k)&\overset{\mathcal{D}}{\Longrightarrow}\sup_{0\leq t<\infty}\max_{0\leq s\leq t}(t+1)w(t)\Big{|}W\Big{(}\dfrac{s}{s+1}\Big{)}-W\Big{(}\dfrac{t}{t+1}\Big{)}\Big{|}~{},\end{split}

\displaystyle\mathbb{P}\bigg{(}\sup_{0\leq t<\infty}\max_{0\leq s\leq t}(t+1)w(t)\Big{|}W\Big{(}\dfrac{s}{s+1}\Big{)}-W\Big{(}\dfrac{t}{t+1}\Big{)}\Big{|}>c(\alpha)\bigg{)}\leq\alpha~{}.

\displaystyle\mathbb{P}\bigg{(}\sup_{0\leq t<\infty}\max_{0\leq s\leq t}(t+1)w(t)\Big{|}W\Big{(}\dfrac{s}{s+1}\Big{)}-W\Big{(}\dfrac{t}{t+1}\Big{)}\Big{|}>c(\alpha)\bigg{)}\leq\alpha~{}.

\displaystyle\limsup_{m\to\infty}\;\mathbb{P}\bigg{(}\sup_{k=1}^{\infty}w(k/m)\hat{E}_{m}(k)>c(\alpha)\bigg{)}\leq\alpha~{}.

\displaystyle\limsup_{m\to\infty}\;\mathbb{P}\bigg{(}\sup_{k=1}^{\infty}w(k/m)\hat{E}_{m}(k)>c(\alpha)\bigg{)}\leq\alpha~{}.

\displaystyle w_{\gamma}(t)=(1+t)^{-1}\max\Big{\{}\Big{(}\dfrac{t}{1+t}\Big{)}^{\gamma},\,\varepsilon\Big{\}}^{-1}\qquad\text{with}\qquad 0\leq\gamma<1/2~{},

\displaystyle w_{\gamma}(t)=(1+t)^{-1}\max\Big{\{}\Big{(}\dfrac{t}{1+t}\Big{)}^{\gamma},\,\varepsilon\Big{\}}^{-1}\qquad\text{with}\qquad 0\leq\gamma<1/2~{},

\displaystyle\sup_{0\leq t<\infty}\max_{0\leq s\leq t}(t+1)w_{\gamma}(t)\Big{|}W\Big{(}\dfrac{s}{s+1}\Big{)}-

\displaystyle\sup_{0\leq t<\infty}\max_{0\leq s\leq t}(t+1)w_{\gamma}(t)\Big{|}W\Big{(}\dfrac{s}{s+1}\Big{)}-

\displaystyle\overset{\mathcal{D}}{=}\sup_{0\leq t<1}\max_{0\leq s\leq t}\dfrac{1}{\max\{t^{\gamma},\varepsilon\}}\Big{|}W(t)-W(s)\Big{|}:=L_{1,\gamma}~{}.

\displaystyle\sup_{0\leq t<1}\max_{0\leq s\leq t}\Big{|}W(t)-W(s)\Big{|}=\max_{0\leq t\leq 1}W(t)-\min_{0\leq t\leq 1}W(t)~{},

\displaystyle\sup_{0\leq t<1}\max_{0\leq s\leq t}\Big{|}W(t)-W(s)\Big{|}=\max_{0\leq t\leq 1}W(t)-\min_{0\leq t\leq 1}W(t)~{},

\displaystyle F_{L_{1},\gamma=0}(x)=1+8\sum_{k=1}^{\infty}(-1)^{k}\cdot k\cdot\big{(}1-\Phi(kx)\big{)}~{},

\displaystyle F_{L_{1},\gamma=0}(x)=1+8\sum_{k=1}^{\infty}(-1)^{k}\cdot k\cdot\big{(}1-\Phi(kx)\big{)}~{},

θ_{m}^{(1)} := θ (F_{1}) = θ (F_{2}) = \dots = θ (F_{m + k_{m}^{*} - 1}) \neq = θ_{m}^{(2)} := θ (F_{m + k_{m}^{*}}) = θ (F_{m + k_{m}^{*} + 1}) = \dots,

θ_{m}^{(1)} := θ (F_{1}) = θ (F_{2}) = \dots = θ (F_{m + k_{m}^{*} - 1}) \neq = θ_{m}^{(2)} := θ (F_{m + k_{m}^{*}}) = θ (F_{m + k_{m}^{*} + 1}) = \dots,

\displaystyle\sqrt{m}\Big{|}\theta^{(1)}_{m}-\theta^{(2)}_{m}\Big{|}\underset{m\to\infty}{\Longrightarrow}\infty~{}.

\displaystyle\sqrt{m}\Big{|}\theta^{(1)}_{m}-\theta^{(2)}_{m}\Big{|}\underset{m\to\infty}{\Longrightarrow}\infty~{}.

\displaystyle\dfrac{1}{\sqrt{m+k^{*}_{m}}}\bigg{|}\sum_{t=1}^{m+k_{m}^{*}-1}\mathcal{IF}_{t}\bigg{|}=\mathcal{O}_{\mathbb{P}}(1)\;\;\;\text{and}\;\;\;\sqrt{m+k_{m}^{*}}|R_{1,m+k_{m}^{*}-1}|=\mathcal{O}_{\mathbb{P}}(1)~{}.

\displaystyle\dfrac{1}{\sqrt{m+k^{*}_{m}}}\bigg{|}\sum_{t=1}^{m+k_{m}^{*}-1}\mathcal{IF}_{t}\bigg{|}=\mathcal{O}_{\mathbb{P}}(1)\;\;\;\text{and}\;\;\;\sqrt{m+k_{m}^{*}}|R_{1,m+k_{m}^{*}-1}|=\mathcal{O}_{\mathbb{P}}(1)~{}.

\displaystyle\dfrac{1}{\sqrt{m}}\bigg{|}\sum_{t=m+k_{m}^{*}}^{m+k_{m}^{*}+\lfloor c_{a}m\rfloor}\mathcal{IF}_{t}\bigg{|}=\mathcal{O}_{\mathbb{P}}(1)\;\;\;\text{and}\;\;\;\sqrt{m}\Big{|}R_{m+k_{m}^{*},m+k_{m}^{*}+\lfloor c_{a}m\rfloor}\Big{|}=\mathcal{O}_{\mathbb{P}}(1)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newcites

supplReferences

A new approach for open-end sequential change point monitoring

Josua Gösmann

Ruhr-Universität Bochum

Fakultät für Mathematik

44780 Bochum, Germany

[email protected]

(corresponding author)

Tobias Kley

University of Bristol

School of Mathematics

Bristol BS8 1UG, United Kingdom

[email protected]

Holger Dette

Ruhr-Universität Bochum

Fakultät für Mathematik

44780 Bochum, Germany

[email protected]

Abstract

We propose a new sequential monitoring scheme for changes in the parameters of a multivariate time series. In contrast to procedures proposed in the literature which compare an estimator from the training sample with an estimator calculated from the remaining data, we suggest to divide the sample at each time point after the training sample. Estimators from the sample before and after all separation points are then continuously compared calculating a maximum of norms of their differences. For open-end scenarios our approach yields an asymptotic level $\alpha$ procedure, which is consistent under the alternative of a change in the parameter. By means of a simulation study it is demonstrated that the new method outperforms the commonly used procedures with respect to power and the feasibility of our approach is illustrated by analyzing two data examples.

MSC classification: 62L99, 62F03

JEL classification: C01,C22

Keywords and phrases: change point analysis, open-end procedures, sequential monitoring

1 Introduction

Nowadays, nearly all fields of applications require sophisticated statistical modeling and statistical inference to draw scientific conclusions from the observed data. In many cases data is time dependent and the involved model parameters or the model itself may not be necessarily stable. In such situations it is of particular importance to detect changes in the processed data as soon as possible and to adapt the statistical analysis accordingly. These changes are usually called change points or structural breaks. Due to its universality, methods for change point analysis have a vast field of possible applications - ranging from natural sciences, like biology and meteorology, to humanities, like economics, finance, and social sciences. Since the seminal papers of Page, (1954, 1955) the problem of detecting change points in time series has received substantial attention in the statistical literature. The contributions to this field can be roughly divided into the areas of retrospective and sequential change point analysis.

In the retrospective case, historical data sets are examined with the aim to test for changes and identify their position within the data. In this setup, the data is assumed to be completely available before the statistical analysis is started (a-posteriori analysis). A comprehensive overview of retrospective change point analysis can be found in Aue and Horváth, (2013). In many practical applications, however, data arrives consecutively and breaks can occur at any new data point. In such cases the statistical analysis for changes in the processed data has to start immediately with the target to detect changes as soon as possible. This field of statistics is called sequential change point detection or online change point detection.

In the major part of the 20th century the problem of sequential change point detection was tackled using procedures, which are optimized to have a minimal detection delay but do usually not control the probability of a false alarm (type I error). These methods are called control charts and a comprehensive review can be found in Lai, (1995, 2001). A new paradigm was then introduced by Chu et al., (1996), who use initial data sets and therefrom employ invariance principles to also control the type I error. The methods developed under this paradigm [see below] can again be subdivided into closed-end and open-end approaches. In closed-end scenarios monitoring is stopped at a fixed pre-defined point of time, while in open-end scenarios monitoring can - in principle - continue forever if no change point is detected.

In the paper at hand we develop a new approach for sequential change point detection in an open-end scenario. To be more precise let $\{X_{t}\}_{t\in\mathbb{Z}}$ denote a $d$ -dimensional time series and let $F_{t}$ be the distribution function of the random variable $X_{t}$ at time $t$ . We are studying monitoring procedures for detecting changes of a parameter $\theta_{t}=\theta(F_{t})$ , where $\theta=\theta(F)$ is a $p$ -dimensional parameter of a distribution function $F$ on $\mathbb{R}^{d}$ (such as the mean, variance, correlation, etc.). In particular we will develop a decision rule for the hypothesis of a constant parameter, that is

[TABLE]

against the alternative that the parameter changes (once) at some time $m+k^{\star}$ with $k^{\star}\geq 1$ , that is

[TABLE]

In this setup, which was originally introduced by Chu et al., (1996), the first $m$ observations are assumed to be stable and will serve as an initial training set. The problem of sequential change point detection in the hypotheses paradigm as pictured above has received substantial interest. Since the seminal paper of Chu et al., (1996) several authors have worked in this area. Berkes et al., (2004) designed a detector for changes in the coefficient in the parameters of a GARCH-process. Horváth et al., (2004), Aue et al., (2006), Aue et al., 2009b , Fremdt, (2015) and Aue et al., (2014) developed methodology for detecting changes in the coefficients of a linear model, while Wied and Galeano, (2013) and Pape et al., (2016) considered sequential monitoring schemes for changes in special functionals such as the correlation or variance. A MOSUM-approach was employed by Leisch et al., (2000), Horváth et al., (2008) or Chen and Tian, (2010) to monitor the mean and linear models, respectively. Recently, Hoga, (2017) used an $\ell_{1}$ -norm to detect changes in the mean and variance of a multivariate time series, Kirch and Weber, (2018) defined a unifying framework for detecting changes in different parameters with the help of several statistics and Otto and Breitung, (2020) considered a Backward CUSUM, which monitors changes based on recursive residuals in a linear model. A helpful but not exhaustive overview of different sequential procedures can be found in Section 1, in particular Table 1, of Anatolyev and Kosenok, (2018).

A common feature of all procedures in the cited literature consists in the comparison of estimators from different subsamples of the data. To be precise, let $X_{1},\ldots,X_{m}$ denote an initial training sample and $X_{1},\ldots,X_{m},\ldots,X_{m+k}$ be the available data at time $m+k$ . Several authors propose to investigate the differences

[TABLE]

(in dependence of $k$ ), where $\hat{\theta}_{i}^{j}$ denotes the estimator of the parameter from the sample $X_{i},\ldots,X_{j}$ . In the sequential change point literature monitoring schemes based on the differences (1.3) are usually called (ordinary) CUSUM procedures and have been considered by Horváth et al., (2004), Aue et al., (2006); Aue et al., 2009b ; Aue et al., (2014), Schmitz and Steinebach, (2010) or Hoga, (2017). Other authors suggest using a function of the differences

[TABLE]

(in dependence of $k$ ) and the corresponding procedures are usually called Page-CUSUM tests [see Fremdt, (2015), Aue et al., (2015), or Kirch and Weber, (2018) among others]. As an alternative we propose - following ideas of Dette and Gösmann, (2019) - a monitoring scheme based on a function of the differences

[TABLE]

A possible advantage of (1.5) over (1.3) is the screening for all potential positions of the change point, which takes into account that the change point not necessarily comes with observation $X_{m+1}$ and so $\hat{\theta}_{m+1}^{m+k}$ maybe ‘corrupted’ by pre-change observations. This issue is also partially addressed by (1.4), where different positions are examined and compared with the estimator of the parameter from the training sample. We will demonstrate in Section 4 that sequential monitoring schemes based on the differences (1.5) yield a substantial improvement in power compared to the commonly used methods based on (1.3) and (1.4). To avoid misunderstandings, the reader should note that a (total) comparison based on differences of the form (1.5), is typically also called a CUSUM-approach in the retrospective change point analysis, see Aue and Horváth, (2013) for a comprehensive overview of (retrospective) change point analysis.

The present paper is devoted to a rigorous statistical analysis of a sequential monitoring based on the differences defined in (1.5) in the context of an open-end scenario. In Section 2 we introduce the new procedure and develop a corresponding asymptotic theory to obtain critical values such that monitoring can be performed at a controlled type I error. The theory is broadly applicable to detect changes in a general parameter $\theta$ of a multivariate time series. As all monitoring schemes in this context the method depends on a weight function and we also discuss the choice of this function. In particular we establish an interesting result regarding this choice and establish a connection to corresponding ideas made by Horváth et al., (2004) and Fremdt, (2015), which may also be of interest in closed-end scenarios.

In Section 3 we discuss several special cases and demonstrate that the new methodology is applicable to detect changes in the mean and the parameters of a linear model. We present a small simulation study in Section 4, where we compare our approach to those developed by Horváth et al., (2004) and Fremdt, (2015). In particular we demonstrate that the monitoring scheme based on the differences (1.5) yields a test with a controlled type I error and a smaller type II error than the procedures in the cited references. In Section 5 we illustrate our approach and compare it to other monitoring schemes by applying it to two examples where the parameter of a linear model of financial data is monitored around the time of the United Kingdom European Union membership referendum 2016. Finally, all proofs are deferred to the online appendix [see Gösmann et al., (2020)], in which we additionally provide some extra simulation results and briefly discuss how our statistic can be used in closed-end scenarios.

2 Asymptotic properties

Throughout this paper let $F$ denote a $d$ -dimensional distribution function and $\theta=\theta(F)$ a $p$ -dimensional parameter of $F$ . We will denote by

[TABLE]

the empirical distribution function of observations $X_{i},\dots,X_{j}$ (here the inequality is understood component-wise) and consider the canonical estimator $\hat{\theta}_{i}^{j}=\theta(\hat{F}_{i}^{j})$ of the parameter $\theta$ from the sample $X_{i},\ldots,X_{j}$ .

To test the hypotheses (1.1) and (1.2) in the described online setting in a open-end scenario we propose a monitoring scheme defined by

[TABLE]

where the statistic $\hat{\Sigma}_{m}$ denotes an estimator of the long-run variance matrix $\Sigma$ (defined in Assumption 2.3) and the symbol $\|v\|_{A}^{2}=v^{\top}Av$ denotes a weighted norm of the vector $v$ induced by the positive definite matrix $A$ . The monitoring is then performed as follows. With observation $X_{m+k}$ arriving, one computes $\hat{E}_{m}(k)$ and compares it to an appropriate weight function, which is sometimes also called threshold function, say $w$ . If

[TABLE]

occurs, monitoring is stopped and the null hypothesis (1.1) is rejected in favor of the alternative (1.2). If the inequality (2.3) does not hold, monitoring is continued with the next observation $X_{m+k+1}$ . We will derive the limiting distribution of $\sup_{k=1}^{\infty}\hat{E}_{m}(k)w(k/m)$ in Theorem 2.7 below to determine the constant $c(\alpha)$ involved in (2.3), such that the test keeps a nominal level of $\alpha$ (asymptotically as $m\to\infty$ ).

Remark 2.1

The statistic (2.2) is related to a detection scheme, which was recently proposed by Dette and Gösmann, (2019) for the closed-end case, where monitoring ends with observation $mT$ , for some $T\in\mathbb{N}$ . These authors considered the statistic

[TABLE]

and showed

[TABLE]

where $W$ denotes a $p$ -dimensional Brownian motion and throughout this paper the symbol $\overset{\mathcal{D}}{\Longrightarrow}$ denotes weak convergence (in the space under consideration). [To avoid confusion, note that in the reference Dette and Gösmann, (2019) the weight function was defined as $w=\dfrac{1}{\overline{w}}$ for some appropriate function $\overline{w}$ ]. However, this statistic cannot be considered in an open-end scenario for the typical weight functions considered in the literature satisfying $\limsup_{t\to\infty}tw(t)<\infty$ (in this case the limit on the right-hand side of (2.5) would be almost surely infinite for $T=\infty$ ). As weight functions satisfying $\limsup_{t\to\infty}t^{2}w(t)<\infty$ will cause a loss in power as indicated in an unpublished simulation study, we propose to replace the factor $(m+j)$ in (2.4) by the size of the initial sample $m$ , which leads to the monitoring scheme defined by (2.2). The remaining weight factor $(k-j)$ is retained as it allocates smaller weights to the case when the post-change estimator $\hat{\theta}_{m+j+1}^{m+k}$ contains greater uncertainty as $j$ is close to $k$ .

Remark 2.2

An essential disadvantage of closed-end scenarios as considered in Dette and Gösmann, (2019) is the problem of choosing the end-point of monitoring before the procedure is launched. This problem drops out when open-end scenarios are employed, where monitoring can (theoretically) proceed forever if no change has been detected. Even if the statistical problems of closed- and open-end scenarios are naturally related, the reader should note, that the mathematical/technical access to both problem is completely different. In the closed-end case it is usually sufficient to assume the existence of functional central limit theorems (FCLTs) as the underlying time frame is compact [see for instance (Aue et al., , 2012), Wied and Galeano, (2013), Pape et al., (2016), Dette and Gösmann, (2019)]. To the authors best knowledge, an FCLT is insufficient in the open-end case and one commonly assumes stronger, uniform stochastic approximations or combines an FCLT with Háyék-Réyni type inequalities [see also Section 2, Horváth et al., (2004), Aue et al., 2009b , Aue et al., 2009a , Fremdt, (2014), Fremdt, (2015), Kirch and Weber, (2018)].

To discuss the asymptotic properties of our approach, we require the following notation. We denote the non-negative reals by $\mathbb{R}_{\geq 0}$ and define $\mathbb{R}_{+}:=\mathbb{R}_{\geq 0}\setminus\{0\}$ . The symbol $\overset{\mathbb{P}}{\Longrightarrow}$ denotes convergence in probability. The process $\{W(s)\}_{s\in[0,\infty)}$ will represent a standard $p$ -dimensional Brownian motion with independent components. For a vector $v\in\mathbb{R}^{d}$ , we denote by $|v|=\big{(}{\sum_{i=1}^{d}v_{i}^{2}}\big{)}^{1/2}$ its Euclidean norm. By $\lfloor x\rfloor$ for $x\in\mathbb{R}$ we denote the largest integer smaller or equal to $x$ . For the sake of a clear distinction we will employ $\sup\limits_{i=1}^{n}a(i)$ for discrete indexing (with integer arguments) and $\sup\limits_{0\leq x\leq 1}a(x)$ for continuous indexing (with arguments taken from the interval $[0,1]$ or another subset of $\mathbb{R}$ ).

Next, we define the influence function (assuming its existence) by

[TABLE]

where $\delta_{x}(z)=I\{x\leq z\}$ is the distribution function of the Dirac measure at the point $x\in\mathbb{R}^{d}$ and the inequality in the indicator is again understood component-wise. We will focus on functionals that allow for an asymptotic linearization in terms of the influence function, that is

[TABLE]

with asymptotically negligible remainder terms $R_{i,j}$ . Finally, for the sake of readability we introduce the following abbreviation

[TABLE]

where $F_{t}$ is again the distribution function of $X_{t}$ . Under the null hypothesis (1.1) we will impose the following assumptions on the underlying time series.

Assumption 2.3 (Approximation)

*The time series $\{X_{t}\}_{t\in\mathbb{Z}}$ is (strictly) stationary, such that $F_{t}=F$ for all $t\in\mathbb{Z}$ . Further, for each $m\in\mathbb{N}$ there exist two independent,

$p$ -dimensional standard Brownian motions $W_{m,1}$ and $W_{m,2}$ , such that for some positive constant $\xi<1/2$ the following approximations hold*

[TABLE]

and

[TABLE]

as $m\to\infty$ , where $\Sigma=\sum_{t\in\mathbb{Z}}\operatorname{Cov}\big{(}\mathcal{IF}_{0},~{}\mathcal{IF}_{t}\big{)}\in\mathbb{R}^{p\times p}$ denotes the long-run variance matrix of the process $\big{\{}\mathcal{IF}_{t}\big{\}}_{t\in\mathbb{Z}}$ , which we assume to exist and to be non-singular.

Assumption 2.4 (Weight function)

The weight function $w:\mathbb{R}_{\geq 0}\to\mathbb{R}_{\geq 0}$ is of the form

[TABLE]

for $t_{w}\geq 0$ and $T_{w}\in\mathbb{R}_{+}\cup\{\infty\}$ . Further $\tilde{w}:\mathbb{R}_{\geq 0}\to\mathbb{R}_{+}$ is a positive continuous function and in case of $T_{w}=\infty$ it satisfies additionally

(1)

$\limsup_{t\to\infty}t\tilde{w}(t)<\infty~{},$ ** 2. (2)

$1/\tilde{w}$ * is uniformly continuous on $\mathbb{R}_{\geq 0}$ .*

Assumption 2.5 (Linearization)

The remainder terms in the linearization (2.7) satisfy

[TABLE]

as $k\to\infty$ with probability one.

Remark 2.6

Let us give a brief explanation on the assumptions stated above.

(i)

Assumption 2.3 is a uniform invariance principle and frequently used in the (sequential) change point literature [see for example Aue et al., (2006) or Fremdt, (2015) among others]. Following the lines of Aue et al., (2006) Assumption 2.3 can be verified by employing the multivariate strong approximation results derived by Eberlein, (1986). This is already spelled out for augmented GARCH-processes in Lemma A.1 of Aue et al., (2006) for the one-dimensional case. Assumption 2.3 is stronger than a functional central limit theorem (FCLT), which is usually sufficient to work with in a closed-end setup [see for example Wied and Galeano, (2013), Pape et al., (2016) or Dette and Gösmann, (2019)]. Another possible starting point to cope with open-end scenarios is an FCLT for any fixed time horizon together combined with Háyék-Réyni-Inequalities [see for example Kirch and Weber, (2018) or Kirch and Stoehr, (2019)]. As this is less frequently used in the literature, we will remain with the other approach. 2. (ii)

Assumption 2.4 gives restrictions on the feasible set of weight functions, which are required for the existence of a (weak) limit derived in Theorem 2.7. The cutoffs defined in (2.10) serve only for technical purposes. By choosing $t_{w}>0$ a delay at monitoring start is introduced, which can avoid problems with false alarms due to instability [see also Kirch and Weber, (2018)]. Selecting $T_{w}<\infty$ allows to additionally cover closed-end scenarios by our theory, which we briefly discuss in Section C of the online appendix [see Gösmann et al., (2020)]. Note that in case of $t_{w}=0$ and $T_{w}=\infty$ the cutoffs disappear, such that $w$ and $\tilde{w}$ coincide. 3. (iii)

It is worth mentioning that it is also possible to define the functions $w,\tilde{w}$ on the smaller domain $\mathbb{R}_{+}$ , while additionally demanding that $\lim_{t\to 0}t^{\gamma}\tilde{w}(t)=0$ for a constant $0\leq\gamma<1/2$ . In this case, the assumption for the remainders in (2.11) has to be replaced by

[TABLE]

which would have the upside to allow for an unbounded weighting at zero. However, for the sake of a transparent presentation, we use Assumption 2.4 here, as this also simplifies the technical arguments in the proofs later on. 4. (iv)

Assumption 2.5 is crucial for the proof of our main theorem and directly implies

[TABLE]

Note that in the location model $\theta(F)=\mathbb{E}_{F}[X]$ we have $R_{i,j}=0$ and (2.11) obviously holds. In general however, Assumption 2.5 is highly non-trivial and crucially depends on the structure of the functional $\theta$ and the time series $\{X_{t}\}_{t\in\mathbb{Z}}$ . For a comprehensive discussion the reader is referred to Dette and Gösmann, (2019), where the estimate (2.11) has been verified in probability for different functionals including quantiles and variance.

The following result is the main theorem of this section.

Theorem 2.7

Assume that the null hypothesis (1.1) and Assumptions 2.3 - 2.5 hold. If further $\hat{\Sigma}_{m}$ is a consistent and non-singular estimator of the long-run variance matrix $\Sigma$ , it holds that

[TABLE]

as $m\rightarrow\infty$ , where $W$ is a $p$ -dimensional Brownian motion with independent components and $|\cdot|$ denotes the Euclidean norm.

For the sake of completeness, the reader should note that due to Assumption 2.4 the asymptotic behaviour of the weight function guarantees that the random variable on the right-hand side of (2.12) is finite (with probability one).

In light of Theorem 2.7 one can choose a constant $c(\alpha)$ , such that

[TABLE]

Note that for Theorem 2.7 we only require that $\hat{\Sigma}_{m}$ is a consistent estimator for the long-run variance (LRV) as $m\to\infty$ . Under both, $H_{0}$ and $H_{1}$ , such an estimator should be computed from the initial stable set, which prevents the estimate from being corrupted by possible changes/breaks [see also the discussion in Section 4]. In practice, the actual choice of LRV-estimator depends on the concrete application and is crucial for the performance of the procedure. A more extensive discussion on LRV-estimation (not only for change point problems) can be found in Andrews, (1991) or Shao and Zhang, (2010).

The following corollary then states that our approach leads to a level $\alpha$ detection scheme.

Corollary 2.8

Grant the assumptions of Theorem 2.7 and further let $c(\alpha)$ satisfy inequality (2.13), then

[TABLE]

The limit distribution obtained in Theorem 2.7 strongly depends on the considered weight function. A special family of functions that has received considerable attention [see Horváth et al., (2004), Fremdt, (2015), Kirch and Weber, (2018) among many others] is given by

[TABLE]

where the cutoff $\varepsilon>0$ can be chosen arbitrary small in applications and only serves to reduce the assumptions and technical arguments in the proof [see also Wied and Galeano, (2013)]. With these functions the limit distribution in (2.12) can be simplified to an expression that is more easily tractable via simulations. Straightforward calculations yield that Assumption 2.4 is satisfied by the function $w_{\gamma}\,$ and the limit distribution in Theorem 2.7 simplifies as follows.

Corollary 2.9

For a $p$ -dimensional Brownian motion $W$ with independent components it holds that

[TABLE]

Remark 2.10

The cumulative distribution function of the random variable on the right-hand side in Corollary 2.9 is hard to derive in general. However in the case of $\gamma=0$ and dimension $p=1$ , an explicit formula can be obtained. Therefor note that (if we ignore the cutoff constant $\varepsilon$ ) the following identity holds with probability one

[TABLE]

where the distribution on the right-hand side is known as the Range of a Brownian motion [see for instance Feller, (1951)]. Its distribution function can be found in Borodin and Salminen, (1996, p. 146) and is given by

[TABLE]

where $\Phi$ denotes the c.d.f. of a standard Gaussian random variable. A corresponding result holds for the limit distribution in a closed-end scenario, see Section C of the online appendix [see Gösmann et al., (2020)], where an additional parameter in the distribution function is associated with the monitoring length.

For the investigation of the consistency of the monitoring scheme (2.2) we require the following assumption.

Assumption 2.11

Under the alternative $H_{1}$ defined in (1.2) let

[TABLE]

where the position of the change within the monitoring data $k_{m}^{*}\in\mathbb{N}$ may depend on $m$ . For the size of change suppose that

[TABLE]

Further assume that the process $\{\mathcal{IF}_{t}\}_{t\in\mathbb{Z}}$ and the remainders defined in Assumption 2.5 are of the following order before the change

[TABLE]

For the period following the change, assume that there exists a constant $c_{a}>0$ and distinct two cases:

(1)

If $k^{*}_{m}/m=\mathcal{O}(1)$ , suppose that

[TABLE]

and for the cutoff constants in (2.10), assume that $t_{w}<k_{m}^{*}/m+c_{a}\leq T_{w}$ . 2. (2)

If $k^{*}_{m}/m\to\infty$ , suppose that

[TABLE]

Assume additionally that the weight function satisfies $T=\infty$ and

[TABLE]

Remark 2.12

The assumptions stated above are substantially weaker than those used to investigate the asymptotic properties of $\sup_{k=1}^{\infty}w(\tfrac{k}{m})\hat{E}_{m}(k)$ under the null hypothesis. Basically, we only assume reasonable behavior of the time series before and after the change point and can drop the uniform approximation in Assumption 2.3 and the uniform negligibility of the remainders in Assumption 2.5. It is easy to see, that the conditions on the sequence $\mathcal{IF}_{t}$ are already satisfied if both, its phases before and after the change fulfill a central limit theorem. Finally, it is worth mentioning that the assumptions for the change position $k^{*}_{m}$ and size $|\theta^{(1)}_{m}-\theta^{(2)}_{m}|$ are very flexible as we allow both quantities to depend on $m$ , where the latter can also tend to zero (sufficiently slow as $m\to\infty$ ).

For early changes, that is $k_{m}^{*}/m=\mathcal{O}(1)$ , it is obvious that the change has to occur before monitoring is stopped, where the inequality $k^{*}_{m}/m\leq T_{w}-c_{a}$ ensures that there is enough data, such that it can actually be detected. On the other hand, the motivation for the inequality $t_{w}<k_{m}^{*}/m+c_{a}$ is slightly more technical. Roughly spoken, it guarantees, that the time frame $m+k^{*}_{m},\dots,m+k_{m}^{*}+c_{a}m$ , which follows the change, is not completely covered by the weight function’s cutoff at monitoring start. For exactly this time frame we know by assertion (2.17), that the time series still behaves reasonable.

For late changes, that is $k^{*}_{m}/m\to\infty$ , it is by Assumption 2.11 not allowed to use a cutoff ( $T_{w}<\infty$ ) in the weight function. Here we rely on the extra assumption in (2.19), which defines a lower bound for the growth rate of the weight function. Heuristically, this is necessary as it guarantees, that a sufficient amount of weight is assigned even to late time points. The reader should note that this assumption is obviously fulfilled by the standard weighting defined in (2.14).

The next theorem yields consistency under the alternative hypothesis.

Theorem 2.13

Assume that the alternative hypothesis (1.2) and Assumptions 2.4 and 2.11 hold. If further $\hat{\Sigma}_{m}$ is non-singular and weakly convergent to a non-singular, deterministic matrix, it holds that

[TABLE]

Consequently, $\lim\limits_{m\to\infty}\mathbb{P}\Big{(}\sup\limits_{k=1}^{\infty}w(k/m)\hat{E}_{m}(k)>c\Big{)}=1$ holds for any constant $c\in\mathbb{R}$ .

3 Some specific change point problems

In this section we briefly illustrate how the theory developed in Section 2 can be employed to construct monitoring schemes for a specific parameter of the distribution function. For the sake of brevity we restrict ourselves to the mean and the parameters in a linear model. Other examples such as the variance or quantiles can be found in Dette and Gösmann, (2019).

3.1 Changes in the mean

The sequential detection of changes in the mean

[TABLE]

has been extensively discussed in the literature [see Aue and Horváth, (2004), Fremdt, (2014) or Hoga, (2017) among many others].

Is is easy to verify (and well known), that the influence function for the mean is given by

[TABLE]

and Assumption 2.5 and the corresponding parts of Assumption 2.11 are obviously satisfied in this case since we have $R_{i,j}=0$ for all $i,j$ . For the remaining assumptions in Section 2 it now suffices that the centered time series $\big{\{}X_{t}-\mathbb{E}[X_{t}]\big{\}}_{t\in\mathbb{Z}}$ fulfills Assumption 2.3, which also implies the remaining part of Assumption 2.11 [see also the discussion in Remark 2.6]. In this situation both, Theorem 2.7 and Theorem 2.13 are valid provided that the chosen weighting fulfills Assumption 2.4.

3.2 Changes in linear models

Consider the time-dependent linear model

[TABLE]

where the random variables $\{P_{t}\}_{t\in\mathbb{N}}$ are the $\mathbb{R}^{p}$ -valued predictors, $\beta_{t}\in\mathbb{R}^{p}$ is a $p$ -dimensional parameter and $\{\varepsilon_{t}\}_{t\in\mathbb{N}}$ is a centered random sequence independent of $\{P_{t}\}_{t\in\mathbb{N}}$ . The identification of changes in the vector of parameters in the linear model represents the prototype problem in sequential change point detection as it has been extensively studied in the literature [see Chu et al., (1996), Horváth et al., (2004), Aue et al., 2009b , Fremdt, (2015), among many others].

This situation is covered by the general theory developed in Section 2 and 3. To be precise let

[TABLE]

be the joint vectors of predictor and response with (joint) distribution function $F_{t}$ , such that the marginal distributions of $Y_{t}$ and $P_{t}$ are given by

[TABLE]

respectively, where we will assume that the predictor sequence is strictly stationary, that is $F_{t,P}=F_{P}$ . In a first step we will consider the case, where the moment matrix

[TABLE]

is known (we will discuss later on why this assumption is non-restrictive) and non-singular. In this setup, the parameter $\beta_{t}$ can be represented as a functional of the distribution function $F_{t}$ , that is

[TABLE]

which leads to the estimators

[TABLE]

from the sample $(P_{i},Y_{i}),\dots,(P_{j},Y_{j})$ . To compute the influence function, let $(\rho,y)\in\mathbb{R}^{p}\times\mathbb{R}$ , then

[TABLE]

which is the influence function (for $\beta$ ) in the linear model stated above [see for example Hampel et al., (1986) for a comprehensive discussion on influence functions]. In the following, we will use the notation $\mathcal{IF}_{t}=\mathcal{IF}\big{(}X_{t},F_{t},\beta\big{)}$ again. Note that

[TABLE]

which directly gives $\mathbb{E}[\mathcal{IF}_{t}]=0$ . Assuming additionally stationarity of $\{\varepsilon_{t}\}_{t\in N}$ , it follows that the random sequence $\{X_{t}\}_{t\in\mathbb{N}}$ is stationary under the null hypothesis. In this case, the linearization defined in (2.7) simplifies to

[TABLE]

Consequently, the remainders in (2.7) vanish and Assumption 2.5 is obviously satisfied. Next, note that the long-run variance matrix is given by

[TABLE]

with $\Gamma=\sum_{t\in\mathbb{Z}}\operatorname{Cov}\big{(}Y_{0}P_{0},~{}Y_{t}P_{t}\big{)}$ , which can be estimated by $\hat{\Sigma}_{m}=M^{-1}\hat{\Gamma}M^{-1}$ where $\hat{\Gamma}$ is an estimator for $\Gamma$ . Observing (3.5) it is now easy to see that in the resulting statistic $\hat{E}_{m}$ the matrix $M$ cancels out, that is

[TABLE]

and for this reason it does not depend on the matrix $M$ . We therefore obtain the following result, which describes the asymptotic properties of the monitoring scheme based on the statistic $\hat{E}_{m}$ for a change in the parameter in the linear regression model (3.1). The proof is a direct consequence of Theorems 2.7 and 2.13.

Corollary 3.1

Assume that the predictor sequence $\{P_{t}\}_{t\in\mathbb{N}}$ and the centered sequence $\{\varepsilon_{t}\}_{t\in\mathbb{N}}$ are strictly stationary and the second moment matrix $M=\mathbb{E}[P_{1}P_{1}^{\top}]$ is non-singular. Further suppose that the sequences $\{P_{t}\}_{t\in\mathbb{N}}$ and $\{\varepsilon_{t}\}_{t\in\mathbb{N}}$ are independent and let the weight function under consideration fulfill Assumption 2.4.

(i)

Under the null hypothesis $H_{0}$ of no change, it follows that the sequence $\{\mathcal{IF}_{t}\}_{t\in\mathbb{N}}$ defined in (3.4) is strictly stationary. Assume further that this sequence admits the approximation in Assumption 2.3 and that $\hat{\Gamma}_{m}$ is a non-singular, consistent estimator of the non-singular long-run variance matrix $\Gamma$ defined in (3.6). Then monitoring based on the statistic $\hat{E}$ in (3.7) is an asymptotic level $\alpha$ procedure. 2. (ii)

Under the alternative hypothesis $H_{1}$ suppose that Assumption 2.11 is fulfilled. If further $\hat{\Gamma}$ is non-singular and weakly convergent to a non-singular, deterministic matrix, the monitoring based on the statistic $\hat{E}$ in (3.7) is consistent.

Remark 3.2

If one replaces (the unknown) moment matrix $M$ on the right-hand side of (3.3) by an appropriate estimate, that is

[TABLE]

one obtains a modified statistic given by

[TABLE]

where $\hat{\hat{\Sigma}}_{m}$ denotes an appropriate long-run variance estimator. Note that in this situation the dependence of the unknown moment matrix $M$ (or its estimators) cannot cancel out as observed in (3.7). The modified statistic can be reasonable to employ if - for example - possible changes in the distribution of $P_{t}$ have to be taken into account. However as equation (3.8) illustrates the modified statistic $\hat{\hat{E}}$ can equivalently be written as weighted residual-based approach. This kind of phenomena is already known in the literature, as Hušková and Koubková, (2005) describe this for a similar statistic in linear models.

4 Finite sample properties

In this section we investigate the finite sample properties of our monitoring procedure and demonstrate its superiority with respect to the available methodology. We choose the following two statistics as our benchmark

[TABLE]

The procedure based on $\hat{Q}$ was originally proposed by Horváth et al., (2004) for detecting changes in the parameters of linear models and since then reconsidered for example by Aue et al., (2012), Wied and Galeano, (2013) and Pape et al., (2016) for the detection of changes in the CAPM-model, correlation and variances, respectively. A statistic of the type $\hat{P}$ was recently proposed by Fremdt, (2015) and has been already reconsidered by Kirch and Weber, (2018). In the simulation study we will restrict ourselves to the commonly used class of weight functions $w_{\gamma}$ defined in (2.14), where we set the involved, technical constant $\varepsilon=10^{-10}$ when computing the statistics. Under the assumptions made in Section 2, it can be shown by similar arguments as given in Section A of the online appendix [see Gösmann et al., (2020)] that

[TABLE]

and

[TABLE]

where $W$ denotes a $p$ -dimensional Brownian motion. For detailed proofs (under slightly different assumptions) of (4.2) and (4.3), the reader is relegated to Horváth et al., (2004) and Fremdt, (2015), where procedures of these types are considered in the special case of a linear model.

Recall the notation of $L_{1,\gamma}$ introduced in Corollary 2.9. By (4.2), (4.3) and Corollary 2.8 the necessary critical values for the procedures $\hat{E}$ , $\hat{Q}$ and $\hat{P}$ combined with weighting $w_{\gamma}$ are given as the $(1-\alpha)$ -quantiles of the distributions $L_{1,\gamma}$ , $L_{2,\gamma}$ and $L_{3,\gamma}$ , respectively and can easily be obtained by Monte Carlo simulations. The quantiles are listed in Table 1 for dimensions $p=1$ and $p=2$ and have been calculated by $10000$ runs simulating the corresponding distributions where the underlying Brownian motions have been approximated on a grid of $5000$ points. In Sections 4.1 and 4.2 below, we will examine the finite sample properties of the three statistics for the detection of changes in the mean and in the regression coefficients of a linear model, respectively. All subsequent results presented in these sections are based on 1000 independent simulation runs and a fixed test level of $\alpha=0.05$ .

4.1 Changes in the mean

In this section we will compare the finite sample properties of the procedures based on the statistics $\hat{E},\hat{P}$ and $\hat{Q}$ for changes in the mean as outlined in Section 3.1. Here we test the null hypothesis of no change which is given by

[TABLE]

while the alternative, that the parameter $\mu_{t}$ changes beyond the initial data set, is defined as

[TABLE]

We will consider four different data generating models, one white noise process and three autoregressive processes with different levels of temporal dependence controlled by the AR-parameter. To be precise we consider the models

(M1)

$X_{t}=\varepsilon_{t}$ , 2. (M2)

$X_{t}=0.1X_{t-1}+\varepsilon_{t}$ , 3. (M3)

$X_{t}=0.5X_{t-1}+\varepsilon_{t}$ , 4. (M4)

$X_{t}=0.7X_{t-1}+\varepsilon_{t}$ ,

where $\{\varepsilon_{t}\}$ is an i.i.d. sequence of standard Gaussian random variables. For the AR(1)-processes defined in models (M2)-(M4), we create a burn-in sample of 100 observations in the first place. To simulate the alternative hypotheses, changes in the mean are added to the data, that is

[TABLE]

where $\delta=\mathbb{E}[X_{m+k^{*}}]-\mathbb{E}[X_{m+k^{*}-1}]$ denotes the desired change amount. For the necessary long run variance estimation we employ the well known quadratic spectral estimator [see (Andrews, , 1991)] with its implementation in the R-package ‘sandwich’ [see Zeileis, (2004)]. To take into account the possible appearance of changes, only the initial stable segment $X_{1},\dots,X_{m}$ is used for this estimate. This restriction is standard in the literature [see for example Horváth et al., (2004), Wied and Galeano, (2013), or Dette and Gösmann, (2019) among many others], and we will briefly discuss ideas to improve this in our outlook in Section 6. The bandwidth involved in the estimator is chosen as $\log_{10}(m)$ for models (M1) and (M2). To take into account the stronger temporal dependence we take a bandwidth of $\log_{10}(m^{4})$ for the models (M3) and (M4).

In Table 2 and 3 we display the type I error for the four time series models (M1)-(M4) and different choices of $\gamma$ in the weight functions. The principal observation for Table 2 is, that all three statistical procedures offer a reasonable approximation of the desired nominal level of $\alpha=0.05$ for the models (M1) and (M2). The results for the weak dependent model (M2) are slightly worse than those for the white noise model (M1).

In Table 3 it can be seen that the nominal approximation is quite imprecise for the stronger dependent models (M3) and (M4) especially for an initial sample size of $m=100$ . This effect seems to be primary caused by a less precise estimation of the long-run variance and the approximation improves with larger initial sample size $m$ , such that the type I error is considerably closer to $5\%$ for $m=400$ . To support this conjecture, we also report the type I error for simulations, in which we used the true long-run variance instead of an estimate, in Table 3. This demonstrates a much more sound approximation of the desired test level of $5\%$ for $m=100$ and $m=200$ .

To discuss the performance under the alternative we illustrate the power of the procedures for increasing values of the change and different change positions for models (M1) and (M2) with $m=100$ in Figure 1 and for models (M3) and (M4) with $m=200$ in Figure 2. As the results are very similar we only report the choice $\gamma=0$ here and provide results for $\gamma=0.45$ in the online appendix. The basic tendency observable in Figures 1 and 2 is concordant: While the procedures behave similar for a change close to the initial data set (first row), the method based on $\hat{E}$ is clearly superior to the others the more the distance to the initial set grows.

The advantage in power is not visible for changes occuring close to the intial training set, where the other procedures perform slightly better.

To give an example, consider the right plot of the first row in Figure 1. Here the test based on the statistic $\hat{E}$ already has a power of 62.8% for a change of $\delta=0.3$ , whereas the tests based on the statistics $\hat{P}$ and $\hat{Q}$ have power of 43.7% and 42.4%, respectively. The superior performance of $\hat{E}$ can most likely be explained by the more accurate estimate of the pre-change parameter by $\hat{\theta}_{1}^{m+j}$ , while the other statistics only involve the estimator $\hat{\theta}_{1}^{m}$ [see formulas (2.2) and (4.1)].

For the sake of an appropriate understanding of our findings, the reader should be aware of the fact, that - although we consider open-end procedures here - simulations have to be stopped eventually. Here we chose this stopping point as $1000$ ( $m=50)$ , $3000$ ( $m=100$ , $m=200$ ) or $4000$ ( $m=400$ ) observations and it is expectable that the testing power of all procedures increases with a later stopping point. Therefore the observed superiority of $\hat{E}$ refers to the type II error until the specified stopping point.

The theory developed in Section 2 also covers the case with a preselected end of the monitoring period. While the statistic for monitoring is the same, the quantile is chosen differently leading to a detector that has higher power if the change is included in the monitoring window and no power if the true change occurs after monitoring ends. We discuss this in the online appendix.

4.2 Changes in linear models

In this section we present some simulation results for the detection of changes in the linear model (3.1). We aim to detect changes in the unknown parameter vector $\beta_{t}\in\mathbb{R}^{p}$ by testing the null hypothesis

[TABLE]

against the alternative that the parameter $\beta_{t}$ changes beyond the initial data set, that is

[TABLE]

To be precise, we consider the model (3.1) with $p=2$ and the following choice of predictors

(LM1)

$P_{t}=(1,\sqrt{0.5}Z_{t})^{\top}$ , 2. (LM2)

$P_{t}=(1,1+G_{t})^{\top}$ with $G_{t}=\bar{\sigma}_{t}Z_{t}$ and $\bar{\sigma}^{2}_{t}=0.5+0.2Z_{t-1}+0.3\bar{\sigma}^{2}_{t-1}$ ,

where $Z_{t}$ denotes an i.i.d. sequence of $\mathcal{N}(0,1)$ random variables in both models. The parameter vector is fixed at $\beta_{t}=(1,1)$ under the null hypothesis and to examine the alternative hypothesis, changes are added to its second component, that is

[TABLE]

For both scenarios we simulated the residuals $\varepsilon_{t}$ in model (3.1) as i.i.d. $\mathcal{N}(0,0.5)$ sequences. Note that the GARCH(1,1) model (LM2) has been already considered by Fremdt, (2015). As pointed out in Section 3.2 the asymptotic variance that needs to be estimated within our procedures is given by

[TABLE]

We estimate this quantity based on the stable segment of observations $(Y_{1},P_{1}),\dots,(Y_{m},P_{m})$ using the well known quadratic spectral estimator [see Andrews, (1991)] with its implementation in the R-package ‘sandwich’ [see Zeileis, (2004)].

The problem of detecting changes in the parameter of the linear model has also been addressed using partial sums of the residuals $\hat{\varepsilon}_{t}=Y_{t}-P^{\top}_{t}\hat{\beta}_{I}~{}$ in statistics similar to (4.1), where $\hat{\beta}_{I}$ is an initial estimate of $\beta$ computed from the initial stable segment. We refer for instance to Chu et al., (1996), Horváth et al., (2004), who - among many others - use statistics similar to $\hat{Q}$ , or Fremdt, (2015), who uses a statistic similar to $\hat{P}$ . Our approach directly compares estimators for the vector $\beta_{t}$ , which are derived using the general methodology introduced in Sections 2 and 3. The resulting statistics are obtained replacing $\hat{\theta}$ by $\hat{\beta}$ in equation (4.1). As pointed out in Remark 3.2, there is a strong connection between methods comparing direct estimates and methods based on weighted residuals, which was already described by Hušková and Koubková, (2005). These authors, in particular, demonstrate that these approaches exhibit power against alternatives, that the plain residual-based statistics fail to distinguish from the null hypothesis. We also refer to Leisch et al., (2000), Hušková and Koubková, (2005) and Hušková et al., (2007) for a comparison of (plain) residual-based methods with methods using the estimators directly.

In Table 4 we display the approximation of the nominal level for the three statistics with different values of the parameter $\gamma$ in the weight function, where monitoring was stopped after $1500$ observations. We observe an acceptable approximation of the nominal level 5% in the case $\gamma=0$ , while the rejection probabilities for $\gamma=0.25$ or $\gamma=0.45$ slightly exceed the desired level of 5%. The fact that larger values of $\gamma\in[0,1/2)$ can lead to a worse approximation of the desired type I error has also been observed by other authors [see, for example, Wied and Galeano, (2013)] and can be explained by a more sensitive weight function at the monitoring start if $\gamma$ is chosen close to $1/2$ . Overall, the approximation is slightly better for the independent case in model (LM1).

In Figure 3 we compare the power with respect to the change amount for different change positions, where we restrict ourselves to the case $\gamma=0$ for the sake of brevity. The results are very similar to those provided for the mean functional in Section 4.1. Again the monitoring scheme based on $\hat{E}$ outperforms the procedures based on $\hat{Q}$ and $\hat{P}$ , and the superiority is larger for a later change. We omit a detailed discussion and summarize that the empirical findings have indicated superiority (w.r.t. testing power) of the monitoring scheme based on the statistic $\hat{E}$ .

5 Two applications

In this section, we apply our methodology along side two competitors to monitor for changes in linear models. We discuss two examples, related to the United Kingdom European Union membership referendum 2016. We consider the linear model

[TABLE]

where $Y_{t}$ is a real-valued response and $(P_{1,t},P_{2,t})$ is a two-dimensional predictor, which is a special case of the linear model considered in Section 3.2.

Recall that our approach requires a stable segment of $m$ observations in which no changes have yet happened. We choose a stable segment of size $m=20$ for our analysis of the data and monitor with the three detectors $\hat{E}$ , $\hat{P}$ and $\hat{Q}$ defined in (2.2) and (4.1), respectively. More precisely, the detectors are updated for every incoming observation, namely $(Y_{t},P_{1,t},P_{2,t})$ , and a decision is made, by comparing the detectors with the corresponding thresholds, whether to reject the null hypothesis and stop the procedure or to continue monitoring with the subsequent observation. Monitoring then continues until a change has been detected by each of the three approaches.

For the next monitoring phase another $m$ observations from the time where the last of the three detectors has rejected are used as the next stable segment. Monitoring ends once the end of the available data is reached.

In the remaining part of this section, we present the outcomes of the previously described statistical analysis for two data sets related to the United Kingdom (UK) European Union (EU) membership referendum, which took place on 23 June 2016. For our analysis we chose the significance levels to be $\alpha=0.05$ and the weight function $w_{0}$ , as defined in (2.14). All data used was obtained from https://www.ariva.de on 26 March 2020.

As our first example, we consider the relation of the UK’s currency, Pound Sterling (GBP), to the Eurozone’s currency, the Euro (EUR), and Switzerland’s currency, the Swiss franc (CHF). More precisely, we consider daily log returns of the exchange rate of GBP to the United States dollar (USD) as a response $Y_{t}$ of a linear model as described in (5.1). As predictors we now consider the log returns of EUR to USD ( $P_{1,t}$ ) and CHF to USD ( $P_{2,t}$ ). A graphical representation of the exchange rates and associated log returns for the period from Januar 2016 to December 2019 can be seen in Figure 4. The outcomes of the previously described analysis are presented visually in the graphs. The first 20 observation (4 Jan 2016 to 29 Jan 2016, note that we only considered trading days FXCM) were used as the stable segment for monitoring. The monitoring starts on 1 Feb 2016 and went on with all three detectors until 17 Mar 2016 when $\hat{P}$ and $\hat{Q}$ reject, but $\hat{E}$ does not yet reject. Monitoring continues with $\hat{E}$ only until 29 Mar 2019 when the first phase of monitoring ends as all three monitoring procedures have rejected the null hypothesis. The monitoring procedure is then restarted with the $20$ observations from the time of rejection (29 Mar 2016 to 25 Apr 2016) as the stable segment and monitoring continues from 26 Apr 2016 until 23 Jun 2016 (day of the UK EU referendum), when $\hat{E}$ and $\hat{P}$ reject. After these rejections, monitoring continues for one more day, until 24 Jun 2016, when also $\hat{Q}$ rejects. Finally, the monitoring procedure is restarted with the next 20 observations (24 Jun 2016 to 21 Jul 2016) and monitoring continues until 31 Dec 2019 without rejections by any of the three detectors. In this example, we see that the three detectors behave quite similar, as each of them rejects twice around the time of the UK EU referendum and no further changes afterwards.

As our second example, we consider the relation of the UK’s market to that of the United States (US) and the EU. More precisely, we consider daily log returns of the FTSE 100, a share index of the 100 companies listed on the London Stock Exchange with the highest market capitalization, as a response $Y_{t}$ of the linear model described in (5.1). As predictors we consider the log returns of two similarly constructed indices that are related to the US and EU markets, namely the S&P 500 ( $P_{1,t}$ ) and the EuroStoxx 50 ( $P_{2,t}$ ). A graphical representation of the prices and log returns for the period from January 2016 to December 2019 can be seen in Figure 5. The outcomes of the previously described analysis are presented visually in the graphs. The first 20 observations (6 Jan 2016 to 1 Feb 2016) were used as the stable segment for the first phase of monitoring. The monitoring starts on 2 Feb 2016 and on with all three detectors until 6 Feb 2017 when $\hat{E}$ rejects, but $\hat{P}$ and $\hat{Q}$ do not yet reject. Monitoring continues with only $\hat{P}$ and $\hat{Q}$ until 16 Mar 2017 when the first phase of monitoring ends with $\hat{P}$ and $\hat{Q}$ also rejecting. In Figure 5, the time that was only monitored by $\hat{P}$ and $\hat{Q}$ is not shaded in gray, because $\hat{E}$ has already rejected.

For the second phase of monitoring the procedures are then restarted with the 20 observations from the time of rejection (16 Mar 2016 to 7 Apr 2016) as the stable segment and monitoring continues from 10 Apr 2016 until 18 Apr 2017 when $\hat{P}$ rejects. Monitoring continues with only $\hat{E}$ and $\hat{Q}$ until 24 Apr 2017 when the second phase of monitoring ends with $\hat{P}$ and $\hat{Q}$ both also rejecting.

For the third phase of monitoring the procedures are then restarted again with the 20 observations from the time of rejection (24 Apr 2017 to 18 May 2017) as the stable segment and monitoring continues from 19 May 2017 until 4 Dec 2018 when $\hat{E}$ rejects. Monitoring continues with $\hat{P}$ and $\hat{Q}$ only until 14 Aug 2019 when the third phase of monitoring ends with $\hat{P}$ and $\hat{Q}$ both also rejecting.

For the fourth and final phase of monitoring the procedures are then restarted again with the 20 observations from the time of rejection (14 Aug 2019 to 6 Sep 2019) as the stable segment and monitoring continues from 9 Sep 2019 until 11 Oct 2019 when $\hat{E}$ and $\hat{P}$ both reject. Monitoring continues with $\hat{Q}$ only until 31 Dec 2019, the end of the available data, without a rejection of $\hat{Q}$ . In this example, we see that $\hat{E}$ , as expected from the simulations, is capable of detecting changes earlier after a longer period of monitoring. Only in the second period, where the rejection happens early, this is not the case.

6 Conclusion and outlook

In this paper we developed a new monitoring scheme for change point detection in a parameter of multivariate time series which is applicable in an open-end scenario. Compared to the commonly used methods we replace the estimator of the parameter from the initial sample $X_{1},\ldots,X_{m}$ by an estimator from the sample $X_{1},\ldots,X_{m+j}$ . We then compare this estimator with the estimator from the sample $X_{m+j+1},\ldots,X_{m+k}$ for every $j=0,\ldots,k-1$ , For the new statistic the asymptotic distribution under the null hypothesis and the consistency of a corresponding test, which controls the type I error, are established. By considering a common class of weight functions $w_{\gamma}$ defined in (2.14) the limit reduces to an elementary distribution, for which quantiles can be obtained by straightforward Monte Carlo simulations. Finally, we demonstrate by a comprehensive simulation study that the new monitoring scheme is superior (in terms of testing power) to a benchmark consisting of common methods proposed in the literature. The new statistic can also be used in closed-end scenarios, for which the same superiority in power is observed.

For a future research project it is of interest to replace Assumption 2.3 by an FCLT for any fixed time horizon and Háyék-Réyni-inequalities, as done for instance in Kirch and Weber, (2018) and Kirch and Stoehr, (2019). Since these conditions are slightly weaker, it would be a benefit to establish the results at hand under those conditions.

Another issue - particularly with regard to our simulation study in Section 4 - is that the test level approximation depends sensitively on the efficient estimation of the long-run variance. The standard approach in our field, which we also followed, is to employ only the initial set for this estimate. As the performance of this is poorly for stronger dependent models, it is logical to take a permanently updated estimate into consideration, which fits to the basic message of this work to enhance initial estimates during monitoring. Moreover, one could tackle this problem developing a concept of self-normalization [see Shao and Zhang, (2010)], which is applicable in an open-end scenario. However, as the discussion of both ideas is technically involved, it is beyond the scope of this paper and left as a promising subject for future research.

Finally, it is a logical next step to also characterize the asymptotic distribution for the stopping times based on the statistic $\hat{E}$ defined in (2.2). Corresponding results are already known for the methods based on $\hat{Q}$ and $\hat{P}$ , see Aue and Horváth, (2004) and Fremdt, (2014), respectively.

Acknowledgments This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Teilprojekt A1, C1) and the Research Training Group ’High-dimensional phenomena in probability - fluctuations and discontinuity’ (RTG 2131) of the German Research Foundation (DFG). Moreover, the authors would like to thank Christina Stöhr and Herold Dehling for extremely helpful discussions. We are also grateful to the two unknown referees for their constructive comments on an earlier version of the paper and to Ivan Kojadinovic for pointing out some errors.

Appendix A Proofs of the results in Section 2

Proof of Theorem 2.7:.

In the proof we use the following extra notation. Define the statistic

[TABLE]

where we have replaced the long-run variance estimator $\hat{\Sigma}_{m}$ by the (unknown) true long-run variance $\Sigma$ in definition (2.2). Further define

[TABLE]

where we have replaced the estimator $\hat{\theta}_{i}^{j}$ by corresponding averages of the influence function in (A.1). Throughout the proof we will frequently use that due to continuity of $\tilde{w}$ and $\limsup_{t\to\infty}t\tilde{w}(t)<\infty$ in Assumption 2.4, the weight function $w$ has a uniform upper bound, say $u_{w}$ . Finally, the triple $(\Omega,\mathcal{A},\mathbb{P})$ will denote the underlying probability space.

The proof itself is now split into several Lemmas A.1 - A.5. The first Lemma shows that $\tilde{E}_{m}(k)$ and $E_{m}(k)$ are (asymptotically) equivalent. Lemma A.2 will approximate $E_{m}(k)$ by Brownian motions, while Lemma A.3 then yields a limit for this approximation. Lemma A.4 finishes the proof by plugging in the covariance estimator, meaning that $\hat{E}_{m}(k)$ and $E_{m}(k)$ are asymptotically equivalent. Finally, Lemma A.5 will establish the other representation of the limit distribution in (2.12). In each Lemma, we suppose that the assumptions of Theorem 2.7 are valid.

Lemma A.1 (Remove Remainders)

It holds that

[TABLE]

as $m\to\infty$ .

Proof.

By the (reverse) triangle inequality and the linearization in (2.7) we obtain

[TABLE]

where we used that $\theta_{t}$ is constant for the last equality. Next, we obtain that

[TABLE]

Similar as in (A.3) it holds

[TABLE]

Using Assumption 2.4 for the weight function $w$ we obtain

[TABLE]

and by similar arguments it holds that

[TABLE]

Further note that due to Assumption 2.5 with probability one

[TABLE]

Now combining (A.5), (A.6) and (A.7) the bounds derived in (A.3) and (A.4) are of order $o_{\mathbb{P}}(1)$ , which finishes the proof of Lemma A.1. ∎

For the proof of the next Lemma we can proceed (roughly) similar to the proof of Lemma 5.2 in \citesupplFremdt2015.

Lemma A.2 (Approximation with Brownian motions)

Define

[TABLE]

then

[TABLE]

as $m\to\infty$ .

Proof.

For the remainder of the proof let $W_{m,i}^{\Sigma}:=\sqrt{\Sigma}W_{m,i}$ for $i=1,2$ and note that this implies

[TABLE]

The last display and the (reverse) triangle inequality then yield

[TABLE]

We will treat the three summands of the last display separately. Using the definition of the operator norm, we derive the following bound for the first summand

[TABLE]

By Assumption 2.3 and the estimate $\|x\|_{\Sigma^{-1}}\leq\|\Sigma^{-1}\|_{op}^{1/2}|x|$ for all $x\in\mathbb{R}^{p}$ , the second factor is of order $\mathcal{O}_{\mathbb{P}}(1)$ . Since $w$ has an upper bound $u_{w}$ we obtain for the first factor, that

[TABLE]

Next, we can bound the second summand on the right-hand side in (A.8) by

[TABLE]

where we used that $(m+j)=(m+j)^{\xi}(m+j)^{1-\xi}\geq j^{\xi}m^{1-\xi}$ . Using again Assumption 2.3, the second factor in the last display is of order $\mathcal{O}_{\mathbb{P}}(1)$ . Moreover, following the idea of the proof of Lemma 3 in \citesupplAue2006 it holds that

[TABLE]

and it remains to treat the third summand of the right-hand side in (A.8), which can be bounded by

[TABLE]

Using Assumption 2.3 and the arguments in (A.9) this term is of order $o_{\mathbb{P}}(1)$ , which finishes the proof of Lemma A.2. ∎

Lemma A.3 (Obtain limit process)

The following weak convergence holds

[TABLE]

as $m\to\infty$ , where $W_{1}$ and $W_{2}$ denote independent, $p$ -dimensional standard Brownian motions.

Proof.

First note that due to the scaling properties of the Brownian motion [see for example page 30 of \citesupplShorack2009], it holds in distribution that

[TABLE]

and so within the proof we will - without loss of generality - only consider $\widetilde{P}_{m}(k)$ . Additionally, we define the processes

[TABLE]

We will show that $L^{(i)}(s,t)$ for $i=1,2$ are uniformly continuous on $\mathbb{R}_{\Delta}^{+}=\{(s,t)\in\mathbb{R}^{2}\;|\;0\leq s\leq t\leq T_{w}\}$ with probability one, where $T_{w}\in\mathbb{R}_{>0}\cup\{\infty\}$ is the (right) cutoff constant from Assumption 2.4. In case $T_{w}<\infty$ this directly follows as both processes are already a.s. continuous, so only $T_{w}=\infty$ is of interest.

Case $L^{(1)}$ : In the following let $\varepsilon>0$ be fixed but arbitrary. Next, we fix one $\omega_{0}\in\Omega$ , such that $W_{1}$ fulfills the law of iterated logarithm and is continuous. As this event has probability one, it suffices to show that $L^{(1)}$ is uniformly continuous for $W_{1}=W_{1}(\cdot,\omega_{0})$ . For the ease of reading we will omit $\omega_{0}$ in the presentation below. By the law of iterated logarithm, there exist $C=C(\varepsilon,\omega_{0})$ sufficiently large, such that

[TABLE]

with $B$ chosen as

[TABLE]

Depending on $C$ , we can split $\mathbb{R}_{\Delta}^{+}$ into the (overlapping) sets

[TABLE]

where we use the definitions

[TABLE]

Further let $d$ denote the maximum distance that is

[TABLE]

Note that by construction of the decomposition in (A.12), whenever $d((s_{1},t_{1}),(s_{2},t_{2}))<\delta$ for sufficiently small $\delta>0$ , then there is $j\in{1,2,3}$ , such that both pairs are in the same subset $\mathcal{M}_{j}(C)$ . Thus the uniform continuity of $L^{(1)}$ follows if we can choose $\delta>0$ sufficiently small, such that

[TABLE]

for $j=1,2,3$ . In the following, we will treat each subset separately.

Set $\mathcal{M}_{1}(C)$ : As this set is compact, the (ordinary) almost sure continuity of $L^{(1)}$ already implies that (A.13) holds for $j=1$ and $\delta>0$ sufficiently small.

Set $\mathcal{M}_{2}(C)$ : We have the following bound

[TABLE]

We will treat both summands of the last display individually. The first one can be bounded again as follows

[TABLE]

and again we will treat both terms separately. For the first term note the upper bound

[TABLE]

and since $1/\tilde{w}$ is uniformly continuous this expression is smaller than $\varepsilon/3$ for sufficiently small $\delta$ . For the second term of the right-hand side of (A.15), note that we have the bound

[TABLE]

and by the choice of $C$ in (A.11) and for sufficiently small $\delta$ this is bounded by $\varepsilon/3$ . To complete the treatment of $\mathcal{M}_{2}(C)$ it only remains to examine the second term on the right-hand side of (A.14). We obtain that

[TABLE]

which can be bounded by $\varepsilon/3$ for sufficiently small $\delta$ since the first factor is a constant and the function $f(s)=W_{1}(s+1)/(s+1)$ is uniformly continuous on the compact set $[0,C+1]$ .

Set $\mathcal{M}_{3}(C)$ : Note that

[TABLE]

where we used the choice of $C$ in (A.11) for the last estimate.

This completes the third case and so the almost sure uniform continuity of $L^{(1)}$ on the set $\mathbb{R}_{\Delta}^{+}$ is established.

Case $L^{(2)}$ : Again let $\varepsilon>0$ and suppose that $d\big{(}(s_{1},t_{1}),(s_{2},t_{2})\big{)}<\delta$ . It holds that

[TABLE]

and note for the first summand of the last display that

[TABLE]

and by Assumption 2.4 the last term is smaller than $\varepsilon/2$ uniformly for all $t_{1}>0$ if $\delta>0$ is chosen sufficiently small. It remains to examine the second summand of the right-hand side of (A.16). It holds that

[TABLE]

where we used that $s_{2}\leq t_{2}\leq t_{1}+\delta$ whenever $d\big{(}(s_{1},t_{1}),(s_{2},t_{2}))<\delta$ . By Assumption 2.4 the last display is smaller than $\varepsilon/2$ whenever $\delta>0$ is sufficiently small and so the almost sure continuity of $L^{(2)}$ is shown.

Finally, we can combine our observations to finish the proof. Note that by the results above, also the process $L(s,t):=|L^{(1)}(s,t)+L^{(2)}(s,t)|$ is uniformly continuous with probability one. Next, recall the cutoff parameters from Assumption 2.4 and observe the identity

[TABLE]

Furthermore, note that

[TABLE]

almost surely as $m\to\infty$ . Now we can finish the proof of Lemma A.3 using the almost sure uniform continuity of $L$ , which implies that for arbitrary $\varepsilon>0$ and almost every $\omega\in\Omega$ we can choose sufficiently large $m=m(\varepsilon,\omega)$ such that

[TABLE]

∎

Combining Lemma A.1, A.2 and A.3 we have already proven that

[TABLE]

and it only remains to investigate the impact of the covariance estimator. Therefore the following Lemma finishes the proof of Theorem 2.7.

Lemma A.4 (Plug in of covariance estimator)

We have that

[TABLE]

Proof.

Observe the bound

[TABLE]

Next note that for a symmetric matrix $A$ and an arbitrary vector $v$ the Cauchy-Schwarz inequality implies

[TABLE]

and we can bound (A) by

[TABLE]

Since $\hat{\Sigma}_{m}$ is a consistent estimator of $\Sigma$ , an application of the continuous mapping theorem yields

[TABLE]

Next, the definition of the operator norm yields

[TABLE]

Now a combination of (A.17) and (A.20) implies that the expression in (A.19) is of order $o_{\mathbb{P}}(1)$ , which completes the proof of Lemma A.4 and thus also the proof of Theorem 2.7. ∎

Combining Lemmas A.1, A.2, A.3 and A.4 we have now established that

[TABLE]

and it remains to show that the distribution on the right-hand side of the last display is identical to the distribution on the right-hand side of (2.12).

Lemma A.5 (Simplify limit distribution)

It holds that

[TABLE]

where $W$ is a standard $p$ -dimensional Brownian motion.

Proof of Lemma A.5 and last step in the proof of Theorem 2.7.

In the following let $Z$ denote a vector of $p$ independent standard Gaussian random variable, that is independent of $W_{1}$ . Observe that

[TABLE]

Following \citesupplHorvath2004, \citesupplFremdt2015, computing the covariance function implies the following identity (in distribution)

[TABLE]

Applying this to (A) yields

[TABLE]

This completes the proof of Lemma A.5 and also of Theorem 2.7. ∎

∎

Proof of Corollary 2.9:.

We proceed according to the proof of Theorem 3.1 of Fremdt, (2015). Using the definition $w_{\gamma}(t)=\bigg{[}(1+t)\max\Big{\{}\Big{(}\dfrac{t}{1+t}\Big{)}^{\gamma},\,\varepsilon\Big{\}}\bigg{]}^{-1}$ , we obtain that

[TABLE]

where we used that the mapping $x\mapsto x/(1+x)$ is bijective and increasing on the domain $[0,\infty)$ with co-domain $[0,1)$ . ∎

Proof of Theorem 2.13:.

For the ease of reading assume in the proof that $c_{a}m,\,c_{a}k^{*}_{m}\in\mathbb{N}$ . We follow the idea of \citesupplStoehr2019 and distinct the cases $k_{m}^{*}/m=O(1)$ and $k_{m}^{*}/m\to\infty$ . In the first case, observe the lower bounds

[TABLE]

Note that by Assumption 2.4 and 2.11 it holds that $w\Big{(}\tfrac{k_{m}^{*}}{m}+c_{a}\Big{)}=\tilde{w}\Big{(}\tfrac{k_{m}^{*}}{m}+c_{a}\Big{)}$ since $\tfrac{k_{m}^{*}}{m}+c_{a}\in(t_{w},T_{w})$ . Thereby, the last display equals

[TABLE]

Using the reverse triangle inequality, the last display is bounded from below by

[TABLE]

To examine the first factor of the last display, note that we have $k^{*}_{m}/m\leq C$ for all $m\in\mathbb{N}$ and a sufficiently large constant $C$ . Using Assumption 2.4, we now obtain

[TABLE]

Now it remains to treat the second factor in (A.22). Note that by Assumption 2.11, we obtain that

[TABLE]

Using also the linearization in equation (2.7) and (2.17), we conclude that

[TABLE]

and

[TABLE]

Putting all together and using also that $\hat{\Sigma}_{m}$ is (weakly) convergent with non-singular limit the treatment of the first case is finished since (A.22) diverges to $\infty$ .

It remains to treat the case $k_{m}^{*}/m\to\infty$ , for which we can employ very similar arguments. Setting $k=k_{m}^{*}(1+c_{a})$ and $j=k_{m}^{*}-1$ in the definition of $w(k/m)\hat{E}_{m}(k)$ gives the lower bound

[TABLE]

As $T_{w}=\infty$ by assumption, we have $w\Big{(}\tfrac{k_{m}^{*}}{m}(1+c_{a})\Big{)}=\tilde{w}\Big{(}\tfrac{k_{m}^{*}}{m}(1+c_{a})\Big{)}$ for $m$ sufficiently large. Now we obtain that (A.23) has the lower bound

[TABLE]

By assumption (2.19) we obtain

[TABLE]

Using (2.18) and repeating the corresponding steps from the first case, it follows that

[TABLE]

Combining the last two statements with the lower bound provided in (A.24) the treatment of the second case and thereby the proof of Theorem 2.13 is finished. ∎

Appendix B Additional simulation results for Section 4

In this section we provide some additional simulation results complementing the discussion on the power of the different monitoring procedures in Section 4. The simulation settings are identical to those used in Section 4 and as the results below are very similar to the results displayed in Figures 1 and 2 we omit a further discussion here.

Appendix C Closed-end scenarios

It is worthwhile to mention that the theory developed in Section 2 also covers the case of closed-end scenarios [sometimes also called finite time horizon]. In this section, we will very briefly discuss this situation and present a small batch of simulation results, which also indicate the superiority of the statistic $\hat{E}$ for closed-end scenarios. Note that the null hypothesis in this setup is given by

[TABLE]

which is tested against the alternative that the parameters changes (once) at some time $m+1\leq m+k^{\star}\leq(T+1)m$ , that is

[TABLE]

Here the factor $T\in\mathbb{N}$ controls the length of the monitoring period compared to the size of the initial data set. Under the assumptions stated in Section 2, we can prove a corresponding statement of Theorem 2.7 and Corollary 2.9.

Theorem C.1

Assume that the null hypothesis (C.1) and Assumptions 2.3 - 2.5 hold. If further $\hat{\Sigma}_{m}$ is a consistent and non-singular estimator of the long-run variance matrix $\Sigma$ it holds that

[TABLE]

where $W$ is a $p$ -dimensional Brownian motion with independent components. Using $w=w_{\gamma}$ for the class of weight functions defined in (2.14), we further obtain that

[TABLE]

The proof of Theorem C.1 follows from Theorem 2.7 by using the factor $T$ as the cutoff $T_{w}$ of the weight function in (2.10). The representation provided in (C.4) follows from a straightforward adaption of Corollary 2.9. The corresponding results for the tests based on statistics $\hat{Q}$ and $\hat{P}$ defined in (4.1) read as follows

[TABLE]

and

[TABLE]

Likewise to Remark 2.10 we can obtain an exact formula for the distribution of $L_{1,0}(T)$ in the case $p=1$ from page 146 of \citesupplBorodin1996, this is

[TABLE]

where $\Phi$ denotes the c.d.f. of a standard Gaussian random variable and $q(T)$ denotes the quotient $T/(T+1)$ .

To complete the discussion on closed-end scenarios we will display a small batch of simulation results for the detection of changes in the mean as described in Section 3.1. For the sake of brevity, only the choice $T=4$ is examined here [unpublished simulation results show similar outcomes for other choices of $T$ ]. The remaining simulation settings are the same as used for the simulation study presented in Section 4.1 and in Table 5 we display the necessary critical values defining the rejection regions for the different procedures.

The approximation of the nominal level under the null hypothesis is displayed in Tables 6 and 7 and in Figures 8 and 9 the power of the different procedures with respect to change amount and change position for $\gamma=0$ is illustrated. The results are very similar to the open-end scenario discussed in Section 4 and confirm the findings of that Section.

\bibliographystylesuppl

apalike \bibliographysupplliterature

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Anatolyev and Kosenok, (2018) Anatolyev, S. and Kosenok, G. (2018). Sequential testing with uniformly distributed size. Journal of Time Series Econometrics , 10(2).
2Andrews, (1991) Andrews, D. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica , 59(3):817–858.
3Aue et al., (2015) Aue, A., Dienes, C., Fremdt, S., and Steinebach, J. (2015). Reaction times of monitoring schemes for ARMA time series. Bernoulli , 21(2):1238–1259.
4Aue et al., (2012) Aue, A., Hörmann, S., Horváth, L., Hušková, M., and Steinebach, J. G. (2012). Sequential testing for the stability of high-frequency portfolio betas. Econometric Theory , 28(4):804–837.
5(5) Aue, A., Hörmann, S., Horváth, L., and Reimherr, M. (2009 a). Break detection in the covariance structure of multivariate time series models. The Annals of Statistics , 37(6B):4046–4087.
6Aue and Horváth, (2004) Aue, A. and Horváth, L. (2004). Delay time in sequential detection of change. Statistics & Probability Letters , 67(3):221–231.
7Aue and Horváth, (2013) Aue, A. and Horváth, L. (2013). Structural breaks in time series. Journal of Time Series Analysis , 34(1):1–16.
8Aue et al., (2006) Aue, A., Horváth, L., Hušková, M., and Kokoszka, P. (2006). Change-point monitoring in linear models. The Econometrics Journal , 9(3):373–403.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A new approach for open-end sequential change point monitoring

Abstract

1 Introduction

2 Asymptotic properties

Remark 2.1

Remark 2.2

Assumption 2.3** **(Approximation)

Assumption 2.4** **(Weight function)

Assumption 2.5** **(Linearization)

Remark 2.6

Theorem 2.7

Corollary 2.8

Corollary 2.9

Remark 2.10

Assumption 2.11

Remark 2.12

Theorem 2.13

3 Some specific change point problems

3.1 Changes in the mean

3.2 Changes in linear models

Corollary 3.1

Remark 3.2

4 Finite sample properties

4.1 Changes in the mean

4.2 Changes in linear models

5 Two applications

6 Conclusion and outlook

Appendix A Proofs of the results in Section 2

Proof of Theorem 2.7:.

Lemma A.1** **(Remove Remainders)

Proof.

Lemma A.2** **(Approximation with Brownian motions)

Proof.

Lemma A.3** **(Obtain limit process)

Proof.

Lemma A.4** **(Plug in of covariance estimator)

Proof.

Lemma A.5** **(Simplify limit distribution)

Proof of Lemma A.5 and last step in the proof of Theorem 2.7.

Proof of Corollary 2.9:.

Proof of Theorem 2.13:.

Appendix B Additional simulation results for Section 4

Appendix C Closed-end scenarios

Theorem C.1

Assumption 2.3 (Approximation)

Assumption 2.4 (Weight function)

Assumption 2.5 (Linearization)

Lemma A.1 (Remove Remainders)

Lemma A.2 (Approximation with Brownian motions)

Lemma A.3 (Obtain limit process)

Lemma A.4 (Plug in of covariance estimator)

Lemma A.5 (Simplify limit distribution)